loading...
Cover image for Performance of UUIDs

Performance of UUIDs

fredbouchery profile image Frédéric Bouchery ・4 min read

(version francaise)

Recently, while analyzing the performance of an application using blackfire, we found that a significant amount of time was spent transforming binary UUIDs into strings and vice-versa.

It has to be mentioned that the UUIDs are stored in the database using binary format, and the application having a Rest API, the UUIDs are therefore always converted into strings as "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx-xxxxxxxx", either to return the JSON response or to process requests with UUIDs.

Like most developers, the application’s developers have relied on the "Ramsey/UUID" package, which seems to be the solution commonly used for handling UUIDs. The excellent reputation of this component has conducted us to look for optimization in the application without challenging the relevance of Ramsey/UUID.

Recently, while reading JoliCode's article "UUID generation in PHP" and the discussion about a pull request on Symfony UID, we looked a bit more into the way our application manipulates UUIDs, because as Grégoire Pineau says, if Symfony's polyfill is as efficient, we might have to reconsider the use of Ramsey/UUID.

We did some profiling of the application, and we found that we spend a lot of time encoding/decoding UUIDs rather than generating them. This might be a common situation for many applications.

In this context, is JoliCode's benchmark still relevant, since it compares the generation of UUIDs?

So we've did other benchmarks, but this time we did it by comparing an operation that converts a binary UUID into a string and a string into a binary.

With Ramsey/UUID, this is done when you create a Uuid object from a binary and then convert it back to a binary:

<?php
\Ramsey\Uuid\Uuid::fromBytes($bytes)->getBytes();

We created a Uuid class that uses the PHP extension "uuid", as a ValueObject:

<?php
final class Uuid {
    /**
     * @var string
     */
    private $uuid;

    public function __construct(string $uuid)
    {
        if (\uuid_is_valid($uuid) === false) {
            throw new RuntimeException("Wrong UUID format");
        }
        $this->uuid = $uuid;
    }

    public function equals(Uuid $other): bool
    {
        return $this->uuid === $other->uuid;
    }

    public static function fromBytes(string $bytes): self
    {
        return new self(\uuid_unparse($bytes));
    }

    public function getBytes(): string
    {
        return \uuid_parse($this->uuid);
    }

    public function toString(): string
    {
        return $this->uuid;
    }
}

Even before running the benchmark for Uuid::fromBytes($bytes)->getBytes(), we already knew that it would outperform Ramsey/UUID because there are no calls to factories and other codecs. Moreover, internally, Ramsey/UUID keeps the UUID in a decomposed field structure, while in the previous code, the UUID is stored as a string.

So we looked at how the polyfill Symfony is implemented, which allows to replace the functions of the UUID extension when it is not available, and we figured out that it could be optimized, because some greedy processing was done in uuid_unparse() to validate the UUID structure.

Here's how we could optimize it:

<?php
final class Uuid {
    /**
     * @var string
     */
    private $uuid;

    public function __construct(string $uuid)
    {
        if (\preg_match('`^[0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}$`Di', $uuid) === 0) {
            throw new RuntimeException("Wrong UUID format");
        }
        $this->uuid = $uuid;
    }

    public function equals(Uuid $other): bool
    {
        return $this->uuid === $other->uuid;
    }

    public static function fromBytes(string $bytes): self
    {
        if (\strlen($bytes) !== 16) {
            throw new RuntimeException("Invalid binary UUID. Length is not 16 bytes");
        }

        $hex = \substr_replace(\bin2hex($bytes), '-', 8, 0);
        $hex = \substr_replace($hex, '-', 13, 0);
        $hex = \substr_replace($hex, '-', 18, 0);
        return new self(\substr_replace($hex, '-', 23, 0));
    }

    public function getBytes(): string
    {
        return \hex2bin(\str_replace('-', '', $this->uuid));
    }

    public function toString(): string
    {
        return $this->uuid;
    }
}

The execution time for 1 million iterations is:

Ramsey => 2583 ms
ext-uuid => 1260 ms
polyfill-symfony => 4020 ms
custom => 642 ms

*(the environment doesn't really matter, just know that the test was done on Ubuntu 19.10 with PHP 7.3.11).

What?

We ran this benchmark many times to be sure, but yes, the code is twice as fast as the UUID extension!

Regarding the generation of UUIDs, as this is an unusual operation for us, we kept using Ramsey/UUID for this.

In our context, we found that Ramsey/UUID is not the best option. But the most surprising thing is that our PHP code is more efficient in encoding/decoding UUIDs than the extension.
So, if you don't have this extension, it's not a big deal, because it's less efficient than the PHP version.

Update: Since writing this article, Nicolas Grekas has submitted a PR to optimize Symfony's polyfill.
Here is the benchmark result with this PR:

Ramsey => 2583 ms
ext-uuid => 1260 ms
polyfill-symfony => 4020 ms
polyfill-symfony-optim => 1145 ms
custom => 642 ms

Many thanks to Nicolas Grekas for reviewing this English translation.


Here is the code used for the benchmark (+ an additional test of the impact on the check in the constructor)

https://gist.github.com/f2r/4f21279732cc1ba81dddc05ef042a1f5

Posted on by:

fredbouchery profile

Frédéric Bouchery

@fredbouchery

More than 37 years of software development, software architect, team leader, speaker and PHP evangelist. Currently working for @klaxoonfr company. TEDx coach

Discussion

markdown guide