Samuel Akopyan

Posted on Feb 9

AI for PHP Developers. Practical Use of TransformersPHP

#php #machinelearning #ai #webdev

AI in PHP: Not Theory, but a Real Starting Point

In my previous articles, I described at a fairly high level why AI seems to be everywhere, yet barely intersects with everyday PHP development. Not because PHP is "unsuitable", but because the conversation usually happens outside our tasks and our habitual way of thinking. And, of course, because there is very little material that explains AI specifically for PHP developers - their problems and their mindset.

After the publication, I was asked the same question several times, in different forms:

Okay, suppose that's true. Where do you actually start?

And this is probably the most interesting question I received. Below, I'll try to answer it.

About the "Entry Point"

When you hear "using AI in a project", your mind likely jumps to a lot of unnecessary things at once: infrastructure, model training, experiments, separate services, new roles in the team, and so on.

But honestly, in most PHP projects we simply don't need all that (even though figuring it out is a fascinating task in itself).

In practice, we do not need to:

train models from scratch
understand gradient descent
collect datasets, etc.

It's important to understand that AI ≠ Data Science. My friends, in most PHP projects no one trains models. Please remember this! At least when we're talking about applied PHP projects that use ready-made models, not research or model training.

What we actually need is to take an existing tool and carefully integrate it into the current logic of the application - just like we do with any other library.

And this is where things get interesting.

What Is TransformersPHP, and What Do You Do With It?

Everyone already knows about using LLMs via APIs. It's useful and convenient, but in that form AI remains something external - a service you send text to. What's inside is a black box.

At some point, I wanted to look at a more "down-to-earth" level: not text generation, but semantic representation: embeddings, classification, and search tasks. This is where most applied backend problems are actually solved - text search, comparison, automatic classification - not human–model dialogue.

And here it suddenly turned out that there are tools that let you do this directly in PHP, without a Python stack. One of them is TransformersPHP.

It's important to clarify right away: this is not an attempt to turn PHP into Python, nor a universal solution. It's a library for inference - using already trained models.

In my view, TransformersPHP is one of the most interesting and illustrative projects in the modern PHP ML ecosystem. Special thanks to its author, Kyrian Obikwelu, for creating and maintaining it.

In short, it's a library that allows you to use transformer models (BERT, RoBERTa, DistilBERT, and others) directly from PHP - without Python and without external APIs.

Essentially, it's a PHP-oriented wrapper around Hugging Face Transformers ideas, adapted for the PHP ecosystem and real-world use cases.

The Key Feature: Local Inference

Models are loaded and executed inside the PHP application itself (via ONNX Runtime). This opens up important architectural possibilities:

no network calls to LLM APIs
full control over data (important for privacy-sensitive environments)
predictable and stable request latency
offline operation (after the first run and model download)

TransformersPHP supports common NLP tasks such as embeddings, text classification, semantic similarity, and more.

What I Like Most

The most valuable feeling is the absence of context switching.

I stay in PHP. I write the same code. I deploy the same way. I use the same architectural approaches. Nothing changes.

In this setup, the model is not some "magical object", but just another data source. Unusual, yes - but no more mysterious than any other.

And that completely changes your attitude toward the topic. Don't you agree?

Running a Model in 10 Minutes

The first launch is critical.
If it's complicated, that's usually where everything ends.

With TransformersPHP, the feeling is the opposite: it's more like working with a regular dependency.

The full installation guide is available in the documentation.
For simplicity, we'll run everything in Docker with all dependencies included.

Yes, the Dockerfile looks large - but it's one-time infrastructure. The actual launch and first demo really take just a few minutes.

Requirements

PHP 8.1 or higher
Composer
PHP FFI extension
JIT compilation (optional, for performance)
Increased memory limit (for heavier tasks like text generation)

Project Structure

/demo/
  ├── app/
  │    ├── demo.php
  │    ├── semantic-search.php
  ├── docker/
  │    ├── Dockerfile
  ├── docker-composer.yaml
  ├── composer.json

Installation (Docker)

File: docker-compose.yml

networks:
  ai-for-php-developers:
    driver: bridge

services:
  app:
    build:
      context: .
      dockerfile: docker/Dockerfile
    volumes:
      - .:/var/www
    ports:
      - "8088:8088"
    command: php -S 0.0.0.0:8088 -t app
    networks:
      - ai-for-php-developers

File: docker/Dockerfile

FROM php:8.2-fpm

# ------------------------------
# Install system dependencies
# ------------------------------
RUN apt-get update && apt-get install -y \
    libzip-dev \
    zip \
    unzip \
    git \
    libxml2-dev \
    libcurl4-openssl-dev \
    libpng-dev \
    libonig-dev \
    && rm -rf /var/lib/apt/lists/*

# ------------------------------
# Install PHP extensions
# ------------------------------
RUN docker-php-ext-install zip pdo_mysql bcmath xml mbstring curl gd pcntl

# ------------------------------
# Enable FFI
# ------------------------------
RUN apt-get update && apt-get install -y \
    libffi-dev \
    pkg-config \
    && rm -rf /var/lib/apt/lists/*
RUN docker-php-ext-install ffi
RUN echo "ffi.enable=1" > /usr/local/etc/php/conf.d/ffi.ini

# ------------------------------
# Install ONNX Runtime
# ------------------------------
ENV ONNXRUNTIME_VERSION=1.17.1

RUN curl -L https://github.com/microsoft/onnxruntime/releases/download/v${ONNXRUNTIME_VERSION}/onnxruntime-linux-x64-${ONNXRUNTIME\_VERSION}.tgz \
    | tar -xz \
    && cp onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}/lib/libonnxruntime.so* /usr/lib/ \
    && ldconfig \
    && rm -rf onnxruntime-linux-x64-${ONNXRUNTIME_VERSION}

# ------------------------------
# Install Composer
# ------------------------------
COPY --from=composer:latest /usr/bin/composer /usr/bin/composer

# Set working directory
WORKDIR /var/www

# Copy existing application directory contents
COPY . /var/www

# Configure PHP
RUN echo "memory_limit = 512M" >> /usr/local/etc/php/conf.d/docker-php-ram-limit.ini
RUN echo "max_execution_time = 300" >> /usr/local/etc/php/conf.d/docker-php-max-execution-time.ini

# Expose port 9000 and start php-fpm server
EXPOSE 9000
CMD ["php-fpm"]

File: composer.json

{
  "type": "project",
  "minimum-stability": "stable",
  "prefer-stable": true,
  "require": {
    "codewithkyrian/transformers": "~0.6.2"
  },
  "config": {
    "allow-plugins": {
      "codewithkyrian/platform-package-installer": true
    }
  }
}

Run the Environment

docker compose build --pull
docker compose up -d
docker compose exec app /bin/bash -c "composer install"

The idea is simple: the example should run locally without manual environment setup.

Basic Demo Example

Let's start with the simplest task: sentiment analysis.
If everything above worked correctly, you can run the example below (note that the first run may take a few seconds):

docker compose exec app php app/demo.php

File: app/demo.php

require_once __DIR__ . '/../vendor/autoload.php';

use function Codewithkyrian\Transformers\Pipelines\pipeline;

// Dedicate a sentiment analysis pipeline
$classifier = pipeline('sentiment-analysis');

$out = $classifier(['I love transformers!']);
print_r($out);

$out = $classifier(['I hate transformers!']);
print_r($out);
The result is exactly what you'd expect.
I love transformers!
Array ( 
  [label] => POSITIVE 
  [score] => 0.99978870153427 
)

I hate transformers!
Array ( 
  [label] => NEGATIVE 
  [score] => 0.99863630533218 
)

The purpose of this example is purely psychological: to remove the barrier.
A model in PHP is just another object you can work with.

Architectural Role of TransformersPHP

This library does not compete with large LLM services like GPT or Claude. It covers a different, very important layer:

fast embeddings
local classification
semantic search
lightweight NLP without external dependencies

In combination with PHP, this makes perfect sense. PHP remains the core business-logic layer, while transformers become an embedded tool rather than a remote service.

TransformersPHP is a great example of how modern ML is gradually becoming less "foreign" to PHP and more native to its ecosystem - through careful engineering bridges like ONNX.

Real Case: Semantic Search Over Events

Tutorial examples are nice, but they get boring quickly. Let's look at a task that actually occurs in backend systems.

Run the example:

docker compose exec app php app/semantic-search.php

Scenario

We have events with short textual descriptions. A user searches for:

"sanctions against IT companies"
"space race among countries in the region"

These exact phrases may not appear in the data at all. Traditional keyword search starts to struggle: synonyms, morphology, different languages, hacks on top of hacks.

A Thought Experiment

Suppose we have an event feed like this:

new restrictions introduced against technology corporations
regional countries increase investments in satellite programs
escalation of political conflict in several provinces

Now the user searches: "space race among countries in the region".

None of those words must appear literally. And that's normal - people don't think the way systems store text.

Solution Idea

Instead of guessing words, we search by meaning - in an engineering sense:

event description → vector
user query → vector
find nearest vectors

Embeddings become a universal index of meaning.

Steps

Take events {id, title, description}
Compute embeddings only for description
Embed the user query
Find nearest vectors
Sort and return results

No model training. No complex infrastructure.
Event embeddings are computed once and stored anywhere.
Queries are embedded at search time.

And… voilà.
No model training.
No magic.
Just a different way to represent text.

Operation Logic

We'll place the operation logic in a separate class, SemanticEventSearch. This class doesn't claim to be the best code on the planet, and is written for demonstration purposes only - so we'll omit any comments on its quality.

final class SemanticEventSearch
{
    private string $model = 'Xenova/paraphrase-multilingual-MiniLM-L12-v2';
    private string $cachePath;
    private string $defaultQuery = 'санкции против IT-компаний';
    private int $topN;

    private ?string $query = null;

    /** @var list<array{id:int,title:string,description:string}> */
    private array $events;

    /** @var array<int, list<float|int>> */
    private array $eventEmbeddingsById = [];

    private $embedder;

    /**
     * Create a new semantic search instance.
     *
     * @param int $topN Number of results to return.
     */
    public function __construct(int $topN = 3)
    {
        $this->cachePath = __DIR__ . '/../embeddings.events.json';
        $this->events = [];
        $this->embedder = null;
        $this->topN = $topN;
    }

    /**
     * Inject events that will be indexed/searched.
     *
     * @param list<array{id:int,title:string,description:string}> $events
     * @return $this
     */
    public function setEvents(array $events): self
    {
        $this->events = $events;
        $this->eventEmbeddingsById = [];
        return $this;
    }

    /**
     * Set the embeddings model identifier.
     *
     * Switching model invalidates in-memory embeddings.
     *
     * @param string $model
     * @return $this
     */
    public function setModel(string $model): self
    {
        $this->model = $model;
        $this->embedder = null;
        $this->eventEmbeddingsById = [];
        return $this;
    }

    /**
     * Set the query to be searched.
     *
     * @param string $query
     * @return $this
     */
    public function setQuery(string $query): self
    {
        $q = trim($query);
        $this->query = $q === '' ? null : $q;
        return $this;
    }

    /**
     * Run the end-to-end semantic search pipeline (cache -> embed query -> score -> top-N).
     *
     * @return array{query:string,results:list<array{score:float,event:array{id:int,title:string,description:string}}>}
     * @throws RuntimeException If events are not set or embeddings output is unexpected.
     */
    public function run(): array
    {
        if (count($this->events) === 0) {
            throw new RuntimeException('Events list is empty. Call setEvents() before run().');
        }

        if ($this->embedder === null) {
            $this->embedder = pipeline('embeddings', $this->model);
        }

        $this->loadEmbeddingsFromCacheIfCompatible();
        $this->ensureAllEventEmbeddings();

        $query = $this->query ?? $this->defaultQuery;
        $queryVec = $this->embedText($query);

        $results = $this->search($queryVec);
        return [
            'query' => $query,
            'results' => $results,
        ];
    }

    /**
     * Compute an embedding vector for a single text.
     *
     * @param string $text
     * @return list<float|int>
     * @throws RuntimeException
     */
    private function embedText(string $text): array
    {
        $emb = ($this->embedder)($text, normalize: true, pooling: 'mean');
        if (!is_array($emb) || !isset($emb[0]) || !is_array($emb[0])) {
            throw new RuntimeException('Unexpected embeddings output format');
        }

        return $emb[0];
    }

    /**
     * Cosine similarity between two vectors.
     *
     * @param list<float|int> $a
     * @param list<float|int> $b
     * @return float
     */
    private function cosineSimilarity(array $a, array $b): float
    {
        $n = min(count($a), count($b));

        $dot = 0.0;
        $normA = 0.0;
        $normB = 0.0;

        for ($i = 0; $i < $n; $i++) {
            $x = (float) $a[$i];
            $y = (float) $b[$i];

            $dot += $x * $y;
            $normA += $x * $x;
            $normB += $y * $y;
        }

        if ($normA <= 0.0 || $normB <= 0.0) {
            return 0.0;
        }

        return $dot / (sqrt($normA) * sqrt($normB));
    }

    /**
     * Load a JSON file and decode to array.
     *
     * @param string $path
     * @return array|null
     */
    private function loadJsonFile(string $path): ?array
    {
        if (!is_file($path)) {
            return null;
        }

        $raw = file_get_contents($path);
        if ($raw === false) {
            return null;
        }

        $data = json_decode($raw, true);
        return is_array($data) ? $data : null;
    }

    /**
     * Encode and save data to JSON file.
     *
     * @param string $path
     * @param array $data
     * @throws RuntimeException
     */
    private function saveJsonFile(string $path, array $data): void
    {
        $json = json_encode($data, JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT);
        if ($json === false) {
            throw new RuntimeException('Failed to encode JSON');
        }

        $ok = file_put_contents($path, $json);
        if ($ok === false) {
            throw new RuntimeException('Failed to write cache file: ' . $path);
        }
    }

    /**
     * Load cached event embeddings only if they were produced by the current model.
     *
     * @return void
     */
    private function loadEmbeddingsFromCacheIfCompatible(): void
    {
        $cached = $this->loadJsonFile($this->cachePath);

        if (!is_array($cached) || !isset($cached['model'], $cached['events']) || !is_array($cached['events'])) {
            return;
        }

        if ($cached['model'] !== $this->model) {
            return;
        }

        foreach ($cached['events'] as $row) {
            if (isset($row['id'], $row['embedding']) && is_array($row['embedding'])) {
                $this->eventEmbeddingsById[(int) $row['id']] = $row['embedding'];
            }
        }
    }

    /**
     * Ensure embeddings exist for all events and persist them to cache.
     *
     * @return void
     * @throws RuntimeException
     */
    private function ensureAllEventEmbeddings(): void
    {
        $missing = [];
        foreach ($this->events as $event) {
            $id = (int) $event['id'];
            if (!isset($this->eventEmbeddingsById[$id])) {
                $missing[] = $event;
            }
        }

        if (count($missing) === 0) {
            return;
        }

        foreach ($missing as $event) {
            $id = (int) $event['id'];
            $text = (string) $event['description'];

            $this->eventEmbeddingsById[$id] = $this->embedText($text);
        }

        $toCache = [
            'model' => $this->model,
            'events' => array_values(array_map(
                fn(array $event): array => [
                    'id' => (int) $event['id'],
                    'embedding' => $this->eventEmbeddingsById[(int) $event['id']],
                ],
                $this->events
            )),
        ];

        $this->saveJsonFile($this->cachePath, $toCache);
    }

    /**
     * Score all events against the query embedding and return the top-N results.
     *
     * @param list<float|int> $queryVec
     * @return list<array{score:float,event:array{id:int,title:string,description:string}}>
     */
    private function search(array $queryVec): array
    {
        $scored = [];

        foreach ($this->events as $event) {
            $id = (int) $event['id'];
            $score = $this->cosineSimilarity($queryVec, $this->eventEmbeddingsById[$id]);

            $scored[] = [
                'score' => $score,
                'event' => $event,
            ];
        }

        usort($scored, static fn(array $a, array $b): int => $b['score'] <=> $a['score']);

        return array_slice($scored, 0, $this->topN);
    }

    /**
     * Render results as plain text.
     *
     * @param string $query
     * @param list<array{score:float,event:array{id:int,title:string,description:string}}> $results
     * @return void
     */
    public function render(string $query, array $results): void
    {
        echo "Query: {$query}\n\n";
        foreach ($results as $row) {
            $event = $row['event'];
            $score = (float) $row['score'];

            echo "[" . number_format($score, 4) . "] #{$event['id']} {$event['title']}\n";
            echo "  {$event['description']}\n\n";
        }
    }
}

Data Preparing

Let's assume this is our data, collected from various sources. For simplicity, let's place it in an array.

$events = [
    [
        'id' => 1,
        'title' => 'Restrictions Against Technology Corporations',
        'description' => 'New economic measures have been introduced against major technology companies.',
    ],
    [
        'id' => 2,
        'title' => 'Development of Space Programs',
        'description' => 'Several countries in the region have increased funding for national satellite projects.',
    ],
    [
        'id' => 3,
        'title' => 'Escalation of Political Conflict',
        'description' => 'An escalation of politically motivated conflict has occurred in several provinces.',
    ],
    [
        'id' => 4,
        'title' => 'Restrictions Against the IT Sector',
        'description' => 'The government announced new restrictions for companies operating in the information technology sector.',
    ],
    [
        'id' => 5,
        'title' => 'Rising Inflation and Key Interest Rate Review',
        'description' => 'The central bank raised the key interest rate amid accelerating inflation and rising prices of imported goods.',
    ],
    [
        'id' => 6,
        'title' => 'Launch of a Small Business Support Program',
        'description' => 'Authorities announced preferential loans and tax incentives for small and medium-sized businesses in the regions.',
    ],
    [
        'id' => 8,
        'title' => 'Data Breach in Online Retail',
        'description' => 'An online retailer is investigating a leak of customers' personal data following the compromise of employee accounts.',
    ],
    [
        'id' => 9,
        'title' => 'Medical Breakthrough: New Diagnostic Method',
        'description' => 'Researchers presented a method for early disease diagnosis using biomarkers, significantly reducing analysis time.',
    ],
    [
        'id' => 10,
        'title' => 'Seasonal Increase in Illness Rates',
        'description' => 'Several cities have reported a rise in respiratory infections, with clinics increasing patient intake.',
    ],
    [
        'id' => 12,
        'title' => 'Drought and Risks to Agriculture',
        'description' => 'Due to prolonged drought, farmers predict lower crop yields, and support measures for agriculture are being discussed.',
    ],
    [
        'id' => 13,
        'title' => 'Final of a Major Sports Tournament',
        'description' => 'In the decisive match of the season, the team secured victory in extra time, setting a new attendance record.',
    ],
    [
        'id' => 14,
        'title' => 'Player Transfer and Squad Reinforcement',
        'description' => 'The club signed a new striker, aiming to strengthen its attacking line ahead of a series of derby matches.',
    ],
    [
        'id' => 15,
        'title' => 'New Rules for Marketplaces',
        'description' => 'The regulator proposed requirements for product labeling and commission transparency on online trading platforms.',
    ],
    [
        'id' => 17,
        'title' => 'Disruptions in Semiconductor Supply',
        'description' => 'Electronics manufacturers warned of chip supply delays due to export restrictions and factory overloads.',
    ],
    [
        'id' => 18,
        'title' => 'Opening of a Contemporary Art Festival',
        'description' => 'A contemporary art festival has opened in the capital, featuring exhibitions, performances, and artist lectures.',
    ],
    [
        'id' => 19,
        'title' => 'Major Deal in the Real Estate Market',
        'description' => 'An investment fund acquired a portfolio of commercial real estate, planning renovation and improved energy efficiency.',
    ],
    [
        'id' => 20,
        'title' => 'Ocean Research and New Findings',
        'description' => 'A scientific expedition collected data on ocean currents and water temperature, refining climate change forecasts.',
    ],
];

Run the example

It's simple here: just run our code and wait for the results.

require_once __DIR__ . '/../vendor/autoload.php';

use function Codewithkyrian\Transformers\Pipelines\pipeline;

final class SemanticEventSearch {...}

$events = [...];

$query = 'sanctions against IT companies';

$search = new SemanticEventSearch(topN: 3);
$search->setModel('Xenova/paraphrase-multilingual-MiniLM-L12-v2');
$search->setEvents($events);
$search->setQuery($query);
$out = $search->run();
$search->render($out['query'], $out['results']);

Result

The output matches by meaning, not words. The first result is the most suitable for the population with our request.

Query: sanctions against IT companies
[0.4288] #4 Restrictions against the IT sector
  The government announced new restrictions for companies operating in the IT industry.
[0.3356] #15 New rules for marketplaces
  The regulator proposed requirements for product labeling and commission transparency.
[0.2598] #8 Data breach in online retail
  An online store investigates a customer data leak after account compromise.

Production Use Cases

This already looks like a product approach, not an experiment.
And it has useful properties:

weak dependence on exact wording
works well with short texts
easily combined with classic filters (date, region, type)

This is not "AI for the sake of AI", but practical, explainable, debuggable logic.

Limitations and Caveats

This is not a silver bullet.

Models are large. Performance matters. Some problems are still better solved with plain SQL.

But that's a normal engineering discussion - about trade-offs, not magic or black boxes.

Where to Go Next

For me, TransformersPHP is a strong example that AI can be used directly in PHP projects without changing the stack or introducing Python.

In my open and free book "AI for PHP Developers", I explore similar cases: when it makes sense, how to choose models, and how not to turn a project into a collection of experimental features.

All examples can be downloaded and run using a ready-made Docker environment from GitHub repository:
https://github.com/apphp/ai-for-php-developers-examples

You can also run all examples from the book directly:
https://aiwithphp.org/books/ai-for-php-developers/examples

If the topic resonates - I'd be glad to discuss and get feedback.
Especially interesting to hear what tasks you already solve (or want to solve) with AI in PHP projects.

DEV Community