DEV Community: Moxio

How to load an SQLite extension in PDO?

Arnout Boks — Thu, 28 May 2020 06:00:00 +0000

Recently we ran into the need to load an extension for SQLite using PHP's PDO database abstraction. Unfortunately, this is not supported out of the box. Eventually we found a solution involving FFI and Z-Engine. In this blog post I will take you on a tour of our solution and the black magic behind it.

The problem

At Moxio we make extensive use of geospatial database functions. Whether it's for infrastructural construction projects or rule management for the living environment, a lot of objects are bound to a specific geographic location and need to be queried as such. In production we run MariaDB as a database engine, which has geospatial support baked in. For integration tests however we use the lightweight SQLite database engine, which does not come with geospatial functions out of the box. The SpatiaLite extension for SQLite can be used for this.

Currently we use a homemade database abstraction layer, which uses the SQLite3 PHP extension for communicating with SQLite. This PHP extension allows loading SQLite extensions such as SpatiaLite using the SQLite3::loadExtension method. So far all is well.

At the moment we're looking at switching our own database abstraction layer for the Doctrine DBAL. It simply doesn't make sense anymore to maintain our own. We'd rather use a battle-tested open source solution and try to contribute back some of our code if possible, so that everybody benefits.

The problem is that the Doctrine DBAL implementation is not based on PHP's SQLite3 extension, but basically a wrapper around PDO, which is already a database abstraction layer. It therefore contains few SQLite-specific features. There are three sqlite-prefixed functions in PDO, but the possibility to load an SQLite extension is not one of them. This is registered in the PHP issue tracker and there's an implementation PR on GitHub, but that one didn't make it into PHP 7.4. So, no Doctrine DBAL for us?

Exploring solutions

Naturally, we started thinking of solutions. Even if PHP didn't expose the functionality to load extensions, maybe we could still call it directly through the SQLite C API. The Foreign Function Interface (FFI) introduced in PHP 7.4 would be a good candidate to do this. However, if we look at the relevant function declaration in the C API we notice a problem:

int sqlite3_load_extension(
  sqlite3 *db,          /* Load the extension into this database connection */
  const char *zFile,    /* Name of the shared library containing extension */
  const char *zProc,    /* Entry point.  Derived from zFile if 0 */
  char **pzErrMsg       /* Put error message here if not 0 */
);

The sqlite3_load_extension requires a pointer to the database connection we want to load the extension into, which we don't have in PHP. We must find some trick to obtain it from the PDO object.

Luckily, through trips to phpCE 2018 and Bulgaria PHP 2019 I've come to know Alexander Lisachenko and his work on Z-Engine. Z-Engine is a PHP library that uses FFI to hook into the Zend Engine, providing access to PHP's internal structures. Maybe this would allow obtaining the internal sqlite3 pointer behind a PDO object? Indeed, after reaching out Twitter, Alexander confirmed that this would probably be possible using Z-Engine.

Hence I set out to try a solution based on Z-Engine, but soon another challenge presented itself. I had hoped that the PDO object would have some sort of direct reference (something like a private variable) to the internal SQLite connection. This turned out not to be the case. Instead, the PDO implementation uses a common C trick where the PDO object and its internal structures are aligned next to eachother in memory. In the C code they use a function php_pdo_dbh_fetch_inner to shift in memory between the PDO object and the internal struct. We cannot use this function from FFI hoewever, since it is inlined and thus not exposed as a separate function to the outside world. I reached out to Alexander once again, and he and Nikita Popov pointed me to a trick involving handler->offset to obtain the correct memory offset to implement the memory shift myself.

Putting it all together

This proved to be the missing puzzle piece. It still took some time to get used to working with PHP FFI, but eventually I got a working solution. First we need to obtain a C pointer to the relevant PDO object using Z-Engine:

use ZEngine\Core;
use ZEngine\Reflection\ReflectionValue;

Core::init();

$pdo_refl_value = new ReflectionValue($pdo);
$pdo_obj_pointer = $pdo_refl_value->getRawObject();

We then initialize an FFI instance to traverse the relevant structures of the PDO-SQLite extension. To keep the header definitions brief, we omit all parts of C structs after the properties we're interested in. Also, we replace pointers to properties before the ones we want (and the sqlite3 pointer itself for now) by void pointers. This allows us to omit the headers for their real types, and doesn't matter since a pointer takes the same amount of memory regardless of what it points to. Also, we had to add a struct keyword in some places to please the PHP FFI parser.

$pdo_sqlite_ffi = \FFI::cdef(<<<CDEF
/* From https://github.com/php/php-src/blob/d1764ca33018f1f2e4a05926c879c67ad4aa8da5/ext/pdo/php_pdo_driver.h#L432 */
struct _pdo_dbh_t {
    /* replaced pdo_dbh_methods* by void* */
    const void *methods;
    void *driver_data;
    /* omitted rest of struct */
};
/* From https://github.com/php/php-src/blob/d1764ca33018f1f2e4a05926c879c67ad4aa8da5/ext/pdo/php_pdo_driver.h#L510 */
struct _pdo_dbh_object_t {
    /* had to insert struct keyword here */
    struct _pdo_dbh_t *inner;
    /* omitted `zend_object std` */
};
/* Adapted from https://github.com/php/php-src/blob/cfc704ea83c56970a72756f7d4fe464885445b5e/ext/pdo_sqlite/php_pdo_sqlite_int.h#L55 */
struct pdo_sqlite_db_handle {
    /* replaced sqlite3* by void* */
    void *db;
    /* omitted rest of struct */
};
CDEF, "pdo_sqlite.so");

With that FFI instance we can now perform the memory shift to get the PDO internal data structure corresponding to the exposed PDO object. We use the handle->offset trick to obtain the correct memory offset for the shift:

// Following https://github.com/php/php-src/blob/d1764ca33018f1f2e4a05926c879c67ad4aa8da5/ext/pdo/php_pdo_driver.h#L520
$offset = $pdo_obj_pointer->handlers->offset;
$pdo_dbh_object_pointer = $pdo_sqlite_ffi->cast("struct _pdo_dbh_object_t*", $pdo_sqlite_ffi->cast("char*", $pdo_obj_pointer) - $offset);
$pdo_dbh_pointer = $pdo_dbh_object_pointer[0]->inner;

The driver-specific handles are in the driver_data property. In case of the PDO SQLite driver this points to a pdo_sqlite_db_handle object, which contains the raw SQLite connection in its db property. We obtain a void pointer to that connection as follows:

// Following https://github.com/php/php-src/pull/3368/files#diff-eb26679695f7db289366ef6b03ee25daR729
$pdo_sqlite_db_handle_pointer = $pdo_sqlite_ffi->cast("struct pdo_sqlite_db_handle*", $pdo_dbh_pointer[0]->driver_data);
$sqlite3_void_pointer = $pdo_sqlite_db_handle_pointer[0]->db;

Now we set up an FFI object for communicating with the SQLite C API (or at least the parts of it we want to use):

$sqlite3_ffi = \FFI::cdef(<<<CDEF
/* From https://github.com/sqlite/sqlite/blob/278b0517d88d4150830a4ee2c628a55da40d186d/src/sqlite.h.in#L249 */
typedef struct sqlite3 sqlite3;

/* From https://github.com/sqlite/sqlite/blob/278b0517d88d4150830a4ee2c628a55da40d186d/src/sqlite.h.in#L6581 */
int sqlite3_load_extension(
  sqlite3 *db,          /* Load the extension into this database connection */
  const char *zFile,    /* Name of the shared library containing extension */
  const char *zProc,    /* Entry point.  Derived from zFile if 0 */
  char **pzErrMsg       /* Put error message here if not 0 */
);
CDEF, "sqlite3.so");

We can then cast our void pointer to an actual sqlite3 pointer and use it to call functions on the SQLite C API:

$sqlite3_pointer = $sqlite3_ffi->cast("struct sqlite3*", $sqlite3_void_pointer);
$sqlite3_ffi->sqlite3_load_extension($sqlite3_pointer, "mod_spatialite.so", null, null);

And voila, our extension is loaded!

Wrapping up

To make this solution easy to use, we published it as a Composer package. It hides all underlying Z-Engine and FFI logic and allows loading SQLite extensions through a simple API. We aim to grow this library to also provide access to other SQLite API's that are otherwise not available in PHP. If there are other SQLite features that you'd like to use with PHP, feel free to file a feature request or create a PR!

Finally, please note that Z-Engine should not be considered stable before version 1.0.0. Use it at your own risk.

What Open Source libraries would you like to have?

Arnout Boks — Wed, 15 Jan 2020 14:11:15 +0000

At Moxio we already have some of our code as open source on GitHub, but we've identified over 25 other parts of our codebase (mainly PHP and Javascript) that we would like to make freely available as open source. That's a lot, and it would take quite some time to release them all.

We can really use your help to set priorities. Just open this form and check all components that you could use and would like to see as open source (takes less than 2 minutes). Feel free to spread the link among your fellow developers; the more voices we hear the more useful code we can release.

Ignoring bulk change commits with git blame

Arnout Boks — Thu, 17 Oct 2019 07:00:00 +0000

A long-standing objection to making bulk changes to code using automated tools (e.g. to conform to a given code style) is that it clutters the output of git blame. With git 2.23, this does not have to be the case anymore! In this post I will start by explaining the value of git blame and how commits with style changes in bulk can be problematic. If you already understand this problem and just want a solution, you can directly skip to the new features git 2.23 has to offer.

Putting changes into context

A characteristic feature of legacy code is that it's often not clear why it operates the way that it does. Some of the original developers may have left or have been reassigned to another project, documentation is virtually nonexistent,
and the few remaining developers do not remember all the details anymore. For example, one day you might stumble upon the follow piece of code:

<?php
function describeBottles(int $amount = 42): string {
    return 'There are ' . $amount . ' bottles of cider on the wall.';
}

Despite being an artificial example, this code already raises some questions. Why is the default amount of bottles being described 42? And why do we describe bottles of cider? Bottles of beer would be a more customary alternative, right? Still these choices were probably made for a good reason; it's just that we don't know that reason.

It would be good if the reasoning behind these choices was documented using comments. However, as happens with legacy code, this is not the case. How can we still find out the motivation behind the current state of the code? A version control system such as git (you use version control, right?) may be helpful here. If you write good commit messages that focus on the why rather than the how, you might be able to distill the context from there. We only need to find which commit made a given change.

The git blame command (or git praise if you prefer a more positive mindset) can be helpful here. It shows, for each line in a file, which commit made the last change to that line, along with its timestamp and author:

$ git blame describeBottles.php
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 1) <?php
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 2) function describeBottles(int $amount = 42): string {
2c386e07 (A.N. Other 2019-09-18 16:58:24 +0200 3)     return 'There are ' . $amount . ' bottles of cider on the wall.';
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 4) }

From this output we can see that line 3 was last changed by 'A.N. Other' in commit 2c386e07. If we lookup the details for that commit we may find out why this function describes bottles of cider rather than beer:

$ git show 2c386e07
commit 2c386e07b72041af1e0c2f827ac31357829429dd
Author: A.N. Other <a.n.other@example.com>
Date:   Wed Sep 18 16:58:24 2019 +0200

    Change drink

    Extensive user testing has shown that our customers like
    cider better than beer.

    Jira: BOT-123

diff --git a/describeBottles.php b/describeBottles.php
index ef2b0fd..9336895 100644
--- a/describeBottles.php
+++ b/describeBottles.php
@@ -1,5 +1,5 @@
 <?php
 function describeBottles(int $amount = 42): string {
-    return 'There are ' . $amount . ' bottles of beer on the wall.';
+    return 'There are ' . $amount . ' bottles of cider on the wall.';
 }

Bingo! We have found the exact commit in which we swapped beer for cider, and more importantly: we know why. We even have a link to a Jira ticket where we can find more information. Perhaps it contains the full user testing results, providing us with even more context. This makes git blame an absolute life saver in legacy projects.

The problem: bulk changes

The team behind the describeBottles-function has always used their own coding standards, with opening braces on the same line and 'CRLF' line endings. One day they decide to adopt the PSR-2 coding style guide that has become popular in the PHP community. Luckily there are tools like PHP-CS-Fixer and phpcbf to automatically convert the whole codebase to the new standard. There are similar tools for almost all other programming languages.

Now the team has one huge commit with style changes in their repository. It touches every line without altering the meaning or intent of the code. If we would now use git blame to find the background for a line of code, the output would be:

$ git blame describeBottles.php
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 1) <?php
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 2) function describeBottles(int $amount = 42): string
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 3) {
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 4)     return 'There are ' . $amount . ' bottles of cider on the wall.';
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 5) }

As we can guess, the last commit that touched line 4 does not give us any useful context anymore:

$ git show df0ee6b0
commit df0ee6b006ee0f90cccc18b71ced290f6cae18d9 (HEAD -> master)
Author: Regina Phalange <r.phalange@example.com>
Date:   Thu Sep 26 16:51:58 2019 +0200

    Fix line endings

diff --git a/describeBottles.php b/describeBottles.php
index 17f0657..d9c9f99 100644
--- a/describeBottles.php
+++ b/describeBottles.php
@@ -1,5 +1,5 @@
-<?php
-function describeBottles(int $amount = 42): string
-{
-    return 'There are ' . $amount . ' bottles of cider on the wall.';
-}
+<?php
+function describeBottles(int $amount = 42): string
+{
+    return 'There are ' . $amount . ' bottles of cider on the wall.';
+}

Because these bulk changes render git blame useless, many teams refrain from applying automated style changes of this magnitude. That means they have to live with either a coding standard that they would rather not have, or with a codebase that does not follow their standards.

Git 2.23 to the rescue!

To limit the impact of such 'unimportant' bulk commits, git 2.23 adds a new option to git blame. Using --ignore-rev, one can specify a commit to be ignored by git blame. Lines changed by the ignored commit will be attributed to the previous commit touching that line instead. This means that even after our bulk style change, we can get back a meaningful context for the 'real' changes to our function:

$ git blame --ignore-rev df0ee6b0 describeBottles.php
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 1) <?php
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 2) function describeBottles(int $amount = 42): string
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 3) {
2c386e07 (A.N. Other 2019-09-18 16:58:24 +0200 4)     return 'There are ' . $amount . ' bottles of cider on the wall.';
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 5) }

Note how even line 3, which was added by the ignored commit, is attributed to commit b589bf1e, which originally added the brace on the line above.

When multiple bulk commits were added over time, it takes quite some effort to add a --ignore-rev for each of them in order to get a 'clean' output for git blame. Luckily, git also provides a way to make this easier on us. In your repository, create a file to hold commit hashes of commits to be ignored by git blame. Naming this file .git-blame-ignore-revs seems to be a common convention.

$ cat .git-blame-ignore-revs 
# Conversion to PSR-2 code style
237de8a6367a88649a3f161112492d0d70d83707

# Fix line endings
df0ee6b006ee0f90cccc18b71ced290f6cae18d9

The file should contain the full (40 char) commit hashes. Lines starting with a # are considered comments and can be used to explain what makes the given commit(s) unimportant. Now we can call git blame with the --ignore-revs-file option to ignore all these commits at once:

$ git blame --ignore-revs-file .git-blame-ignore-revs describeBottles.php

The .git-blame-ignore-revs can be versioned inside the repository, so that all developers can use (and maintain) the same list of ignored commits. To avoid typing the extra option with every command, we can set the blame.ignoreRevsFile configuration variable:

$ git config blame.ignoreRevsFile .git-blame-ignore-revs

This causes git to automatically ignore the commits specified in that file for every call to git blame. If you stick to the .git-blame-ignore-revs naming convention you can even set this configuration variable globally, so that it applies to all your repositories, each with their own .git-blame-ignore-revs file. Be aware however that git currently gives an error when this setting is configured globally but a repository has no .git-blame-ignore-revs file yet. I
hope that this is considered a bug and will be fixed in an upcoming version.

Another limitation to be aware of is that platforms like GitHub and GitLab do not yet support files with commits to ignore for the 'blame'-button in their user interface. It would be awesome if they added such a feature soon.

One last thing: be aware that you need at least version 2.23 of git to use these new features. On the git downloads page you can find out how to obtain the latest git for your platform. But even if you cannot upgrade yet for some reason, you can already start building a .git-blame-ignore-revs file with commits you would like to hide from git blame. That way you can hit the ground running when it's time to upgrade.

Summary

Git 2.23 contains an absolute game changer that is not even mentioned in the release highlights. Fear of polluting the git blame output no longer has to be a blocker for applying style changes in bulk: these commits can now be ignored. You can even share a list of ignored commits with your entire team. So go ahead and switch over to that new coding standard; git won't hold you back anymore.

Mutation testing in PHP

Arnout Boks — Wed, 17 Jul 2019 22:00:00 +0000

Recently I spent an afternoon experimenting with mutation testing in PHP. In this post I would like to share the background, the main idea of mutation testing, and the lessons I’ve learned from it.

Moxio Academy

At Moxio we regularly schedule a Moxio Academy session. That means we put our normal work aside for an afternoon to learn something new or experiment with the latest technologies. You can choose any topic, as long as it benefits your personal development and thus your role at the company. There are just a few rules:

You have to decide on a topic and announce it in advance. This is meant to improve synergy within the team: maybe someone already knows a lot about the topic and has some resources to recommend. If multiple people want to learn about the same thing, maybe they can combine their efforts and learn together.
You must create a plan for how to approach your topic. This does not have to be very extensive, just a rough sketch is enough. The idea is that this forces you to focus your learning efforts rather than just reading random resources without any sense of a general direction.
You should work towards some kind of deliverable. Ideally this is something that can be shared with the rest of the team: a tool, wiki article, some proof-of-concept, blogpost or lunch-and-learn presentation. Again this helps to focus the process, to have some concrete end result, and to spread the acquired knowledge within the team.

We usually conclude the afternoon together to present our process and results, and (for those who want) have a pizza.

Mutation testing

At the latest Moxio Academy I wanted to experiment with mutation testing in PHP. I had already heard about mutation testing and Infection as a tool to do so a few times, but never got around to play with it hands-on. This seemed like a good opportunity.

In essence, mutation testing is a way of measuring the quality of a test suite. A tool generates a number of copies (mutants) of your source code under test, but modifies each of them in a small way. These small modifications are basically just errors that you as a programmer could have made, such as replacing a < with a <=. A high-quality testsuite would detect these modifications (kill the mutant) by failing one or more tests. In a suboptimal test suite it might happen that the tests remain green despite the modification to the source code. In such a case we speak about an escaped mutant. These present an opportunity to improve the test suite by adding a test that fails in the presence of the given modification. The Mutation Score Indicator (MSI, the percentage of mutants detected by the test set) provides a metric for the quality of the test suite.

My plan was to apply Infection to one of our internal libraries. I picked this library because of its high (line-based) code coverage, which would imply a high-quality test suite. Therefore I wondered what ‘leaks’ Infection could still find in it. The idea was to convert the escaped mutations found by Infection to new tests, working towards a PR to increase the MSI of the project as the main deliverable. This blog post wasn’t part of the original plan, but arose as an extra way of sharing some of the lessons learned along the way.

Lessons learned

Mutation testing is awesome!

My main takeaway was that mutation testing can really help you to improve your test suite. Even on a project with 96% line coverage, Infection found multiple scenarios that were not actually covered by the test suite.

One simplified example of this is the following. Suppose we have a function to generate a description for the amount of bottles of beer on the wall:

<?php
function describeBottles(int $amount = 99): string {
    return $amount . ' bottles of beer on the wall';
}

We could already have a test for this function like this:

<?php
class DescribeBottlesTest {
    public function testDescribesTheAmountOfBottlesOfBeer() {
        $this->assertSame('42 bottles of beer on the wall', describeBottles(42));
    }
}

At first sight it may seem that this fully tests the function, and indeed the function shows up with all lines covered in a code coverage report. However, our tests do not check the default value for the argument. This means that some behavior of our function, i.e. that by default it describes 99 bottles, is not verified. Infection can uncover this when it produces a mutation like:

12) /tmp/bottles.php:2    [M] DecrementInteger

--- Original
+++ New
@@ @@
<?php
- function describeBottles(int $amount = 99): string {
+ function describeBottles(int $amount = 98): string {
      return $amount . ' bottles of beer on the wall';
  }

Here the DecrementInteger mutator has decremented an integer literal occurring in the source code, an error we could have made ourselves if we hit the wrong key on our keyboard. This would currently go unnoticed, but we can fix that by adding a test like:

<?php
class DescribeBottlesTest {
    public function testDescribes99BottlesByDefault() {
        $this->assertSame('99 bottles of beer on the wall', describeBottles());
    }
}

Other untested aspects commonly found by Infection were exception messages and some uncovered paths through complex logic. I added tests or assertions for most of these. For the logic, this is also a sign that the (cyclomatic/N-path) complexity is too high. Those pieces of code should be refactored, but I scheduled that for later.

On the first Infection run, without any changes to the test suite, it produced a 89% MSI. This is already quite good, but with some additions to the test suite I managed to raise the MSI to 93%.

‘False positives’

Getting the MSI much higher proved to be difficult though. Sometimes escaped mutants had changes to parts of the code that are nonessential details. In our view, these details do not constitute relevant behavior and are not part of the ‘contract’ of that unit. Why are they in the code then? Well, sometimes they have to be due to syntactical constraints. Take for example the following piece of code that may throw an exception:

<?php
// ...
try {
    $database->executeSql("...");
} catch (DuplicateDatabaseKeyException $e) {
    throw new UserAlreadyExistsExeption("User $username already exists", 0, $e); 
}

As explained in a previous blog post, we think it is important that exceptions are thrown at the right level of abstraction:

Best practices for PHP exception handling

Arnout Boks for Moxio ・ Jan 10 '19

#php #exceptions #oop #softwaredesign

This requires catching and re-throwing exceptions like in the above code snippet. In that post, I also mentioned that we want to maintain the connection with the root cause by setting the $previous-parameter of the new exception. Due to PHP's syntax this requires one to also provide the $code-parameter, which we do not really use. We usually set it to 0 (the default), but honestly couldn't care less about its value.

Now the same DecrementInteger mutator could come along and produce this mutation:

53) /tmp/exception.php:289   [M] DecrementInteger

--- Original
+++ New
@@ @@
  } catch (DuplicateDatabaseKeyException $e) {
-     throw new UserAlreadyExistsExeption("User $username already exists", 0, $e); 
+     throw new UserAlreadyExistsExeption("User $username already exists", -1, $e); 
  }

This mutation will not get caught by our test suite (because we do not assert the code of produced exceptions), so the mutant will escape. In this case we do not care however. We don’t expect our test suite to detect this, as (to us) the mutated code is just as fine as the original. We could add assertions for the exception code, but that would just be extra work without yielding extra value.

I think this is something to be aware of when doing mutation testing. Not all escaped mutants are necessarily bad. For each of them you have to ask yourself the question “Would it be bad if I made this ‘error’ in my code?”. If the answer is no, don’t bother about the escaped mutant.

What test to add?

One of the main difficulties when trying to kill a mutant was figuring out what kind of test to add. Just from seeing a changed line in the code it is not always clear how to write a test that would fail on the given line. This was further amplified by the fact that the codebase I worked with was written by a colleague, and I did not know it inside out yet.

I learned that the HTML code coverage report generated by PHPUnit can be of tremendous help here. If you hover over a covered line of code there, it shows you which tests cover that line. This way you can lookup which tests already exercise the mutated line of code. The test you want to add to kill the mutant is probably a variation of one of them. This reduces your problem to analyzing these ‘example’ tests and reasoning about what you could change in them to fail when the mutation is present.

It improves your code too

Not only the test suite got some updates during my experiments with mutation testing; the production code improved as well. Sometimes the mutants generated by Infection were actually better than the original version! One such case looked somewhat like this:

<?php
class SomeStore {
    public static function createWithInMemoryDatabase(): self {
        $database = $database ?? new SqliteDatabase(':memory:');
        // ...
    }
}

Among the generated mutations there was one that removed the $database ?? null coalesce operator. This is actually an improvement, as the null coalesce operator is useless here! At the start of the function, $database is always null, so the operator always resolves to its right-hand side, creating a new database. This code was an artifact from a moment when the method was named differently and allowed injecting a custom database through a $database parameter. Now that parameter has been removed, we can get rid of the null coalesce as well. While other static analysis tools could have found this dead code as well, at least Infection brought it to our attention.

Another example where the mutant turned out to be better was the removal of a trim() function. At that spot in the code there could never be any significant or problematic whitespace. The trim()-call thus was unnecessary and could be removed.

Conclusion

Once you’ve had some practice with mutation testing, it can really help with improving both your test suite and code. Infection is straightforward to setup and use, and makes it fairly simple to get started in PHP. Just keep in mind that not all escaped mutants are a problem and blindly striving for 100% MSI does not add value. Try it out, and let me know what your experiences are!

Best practices for PHP exception handling

Arnout Boks — Wed, 09 Jan 2019 23:00:00 +0000

Handling errors or other non-’happy path’ situations is essential when creating robust PHP applications. While errors were the main construct to do so in PHP 4, exceptions have been around since PHP 5. They should nowadays be considered the main mechanism for handling alternative or exceptional paths. It seems that these alternative paths still don’t always get the attention they deserve, though.

Proper exception handling takes quite some effort, but will eventually result in a much more stable application. A sensible exception handling strategy makes it clear what exceptions should be expected (and thus handled!) at a given point in the code. Moreover it will maintain the encapsulation and abstraction you carefully applied to your object-oriented design. Last but not least, it should make debugging a breeze.

In this post, I would like to introduce you to the set of best practices we have adopted at Moxio over the years. We have found these to work very well for us, but keep in mind that they are our best practices; your mileage may vary. The following guidelines are aimed at PHP code, but the basic principles behind them will also (with some translations) work for similar languages.

Types of exceptions

We make a distinction between the two top-level types of exceptions that the PHP SPL library defines. These are LogicException and RuntimeException. The interpretation of these two types in literature varies. We however attach the following meaning to them:

A LogicException is an exception that should require a fix or change in code. With 'code' we do not only mean source code, but any content managed by a developer. This includes configuration and database contents that are not maintained by an end user of the system. A LogicException is mainly an internal guard or assertion. In perfectly written and wired code one should never occur.
A RuntimeException is an exception that might also occur in code that is written and configured 'perfectly'. Such an exception may be caused by input from the end user, or an error in (the communication with) an external system. When a RuntimeException is thrown, it should not require a fix in the code. If the exception ends up uncaught however, we should add some code to handle it. This may mean logging the error, using a fallback strategy, reporting the error to the user, or a combination thereof.

Note that this interpretation differs from the generic perception in Java, where a RuntimeException is what we would call a LogicException. Furthermore, in our standards we classify all exceptions into either one of these categories. We do not create exceptions as a direct subclass of Exception.

Exceptions as part of the function signature

We see the possible variants of RuntimeException that can be thrown or bubbled up from a function as part of the contract of that function. That means such exceptions should be annotated on the function using @throws.

If the function is part of the implementation of an interface, that interface should specify that (a supertype of) the exception in case could be thrown. Annotating such an exception also on the implementation is not necessary. A function that implements an interface should never throw a runtime exception not declared on the interface. Such a case would be a violation of the Liskov Substitution Principle.

On the contrary, we consider the different types of LogicException to not be part of the contract of a function. Of course we assume a perfect implementation, but at the same time we know there can always be an error somewhere. Therefore a LogicException can always be expected and unexpected at the same time. Hence annotating it using @throws is not desirable. See also 'catching exceptions' below.

Creating exception subclasses

Under these guidelines, creating a hierarchy of subtypes under RuntimeException is very desirable. The more specific the exceptions we throw (and annotate as part of our contract), the more granular we can handle them. Subclasses of LogicException however are not necessary. These are not part of the contract of a function anyway.

Catching exceptions

To us, a RuntimeException is a checked exception. When such an exception can be thrown from a function, the calling function needs to either catch that exception or declare it as a possible exception from itself using @throws. Catching a runtime exception is a good idea when the calling code can sensibly handle the exception, or when it can re-throw it at a better level of abstraction (see further down). At least it is important to think about the handling of these exceptions when calling a function that might throw them.

A logic exception should never be caught, at least not based on the type LogicException or a subclass thereof. These exceptions should never occur in a correct implementation. It therefore does not make sense to try to handle them along the line. Instead, the logic that triggered the exception should be fixed.

At some point it may be necessary to catch all exceptions, including both variants of RuntimeException and LogicException. As we require PHP 7 at minimum, we do this using Throwable in our catch-clause, not Exception. Constructs like this are exclusively meant for component entry points, and should never make assumptions about the specific cause or character of the exception. Such code will therefore never contain specific error handling logic. Instead, they are a generic catch-all for logging or reporting the error, or giving the end user feedback that something went wrong.

Sometimes a catch-all like this is also used for cleaning up resources or closing open connections. A finally-block is often better in such situations, especially if the exception is re-thrown at the end of the catch-block.

Special case: debug info

One valid reason to catch a LogicException anyway is to augment it with extra debugging information that was not available deeper in the call stack. In such a case we directly throw a new LogicException with the extra data, and the original exception in $previous. We preferably catch such an exception based on the most generic type possible. This may be a common base class or marker interface for all LogicExceptions that can occur in the given piece of code.

The alternative for such a catch would be to pass the debug information on to the callees that eventually produce the error. We prefer not to do so if such information is only used for passing it into an exception.

Throwing a new exception after catching

After catching an exception, it is of course possible to throw a new exception. Indeed, for sensible exception handling this is needed more than one may expect. Just make to always set the $previous-parameter of the new exception to the original (caught) exception. This ensures the the full cause of the exception can still be derived. Because of the order of the Exception constructor parameters it is then often necessary to specify a $code. We don't use this parameter, so we just set it to 0.

Translation to the correct level of abstraction

Catching-and-throwing is often necessary to ensure that an exception manifests itself at a suitable level of abstraction. Suppose we have a UserRepositoryInterface that is implemented by a DatabaseUserRepository. The latter stores users in the database, which raises a DuplicateDatabaseKeyException if a user with the given username already exists. According to the rules described earlier we should use @throws to annotate this exception on the interface, but that rightly feels a bit strange. Why would a generic interface, meant to abstract the storage mechanism away, know about a type of exception specific to a database? The solution is to catch the DuplicateDatabaseKeyException within DatabaseUserRepository and throw something like a UserAlreadyExistsExeption in its place. This exception matches the level of abstraction of UserRepositoryInterface: it knows about users, but not how they are persisted. It can therefore added to the signature of that interface without issues.

From `RuntimeException` to `LogicException`

It is very well possible that an exception that was a RuntimeException is at some point converted into a LogicException. This has to do with specific knowledge we have at that point, from which we know that the given exception should not be possible. That knowledge was not available deeper in the call stack.

To illustrate this, assume we have an XML reader with a method getUniqueTagContents. This method reads the contents of one unique tag from an XML file based on the tag name. A lot of things can go wrong inside such a method: the XML file may be malformed, the given tag may not be present, or it may occur multiple times. These are all examples of a RuntimeException. Without extra knowledge about the origin of the XML (which may be uploaded by a user) and the tag name they can also occur in a perfectly programmed application. But it is possible that we use this method on a piece of XML that we just validated against an XML schema that enforces the existence and unicity of the given tag. In such a situation we know that getUniqueTagContents should not fail. The same applies when we use the method to read a configuration file that we put in VCS ourselves and which we thus fully control.

In such situations we still have to catch the ‘impossible’ runtime exceptions, as we consider them checked. In this catch-block we then throw a LogicException: this situation should never happen. Of course we save the original exception through the $previous parameter.

This is a pattern that occurs often. Deep in the call stack, where the bigger picture is not available, many faults are a RuntimeException. As the exception bubbles up (whether translated to another level of abstraction or not), it reaches a point where we know the error should be impossible. At that point it becomes a LogicException. Note that the inverse is not possible: an unexpected error can not suddenly be expected.

A grey area

The distinction between a RuntimeException and a LogicException is not always 100% clear. There is a grey area where the correct type of exception depends on interpretation and the semantic constract of a function. A few examples to illustrate this:

Syntax error in a query

A method executeQuery to execute a database query can fail due a syntax error in that query. A first sight this looks like a RuntimeException: we don't know where the query comes from and thus cannot guarantee its syntactical correctness. On the other hand it would be very strange if user input (or input from another source beyond our control) could lead to a syntax error. That smells of SQL injection. It is therefore very reasonable to state that the code calling executeQuery is responsible for the syntactical correctness of the query. That makes the exception a LogicException. An exemption would be if we were building an application like adminer or phpMyAdmin where we should expect errors in user-entered SQL queries.

Cache item not found

Suppose we have a cache class with a method get($key) to retrieve a cache item. Of course it can happen that get is called with a key that does not exist. We assume that we have chose to communicate this via an exception (alternatives would be through the return value or a by-reference parameter). Would such an exception be a runtime or a logic exception?

The answer probably depends on the other methods on the cache class and how we expect to use these. If the cache has a has method we could demand that a consumer uses that method to check whether a cache item exists before retrieving it with get. In that case a LogicException is reasonable. Depending on the implementation of the cache there may be a tiny chance that a cache item is deleted between the calls to has and get. In such cases even a perfect implementation (which checks existence first) cannot fully prevent the error. The 'correct' exception category is not very clear here. For now we tend to use a LogicException for situations with rare edge cases like this.

Corrupt data in the database

Another grey area is formed by errors that can only occur in case of a corrupted database. From a puristic view this is an example of a RuntimeException, the database being an external factor. On the other side it is undesirable to have to take the possibility of database corruption into account everywhere in the code. This applies all the more if only the application and occasionally a developer or sysadmin writes to the database. A more pragmatic approach is to consider database corruption a LogicException. Chances are that the corrupt data was caused by an implementation error in the application. Anyway, developer or sysadmin intervention is required to manually fix the data.

Recap

We distinguish checked exceptions that represent inherently ‘unfixable’ situations from unchecked exceptions that represent programming errors. Therefore we know at every point in the code what exceptions we should expect, and thus handle. By catching and re-throwing exceptions we make sure that they are at the proper level of abstraction and thus not break encapsulation. Chaining the original exception using the $previous parameter ensures that no debugging information is lost.

Are you trying out these practices in your project? Let us know if they work for you, if there are obstacles you run into, or if you have improvements to these guidelines. We’d love to get your feedback!

Start testing with PHPT tests in PHPUnit

Arnout Boks — Wed, 20 Jun 2018 22:00:00 +0000

Over the years, automated testing has become an established practice in software development. It thus has also become an essential skill to learn for any developer. In the past few months, I have talked to several developers who had recently started with testing in PHP. Some of them expressed they found their testing framework of choice (often PHPUnit) quite difficult to get started with.

In a sense, this is not surprising. Getting started with testing is already challenging on its own. You have to choose units to test and learn to write testable code. You must learn to distinguish essential behavior from implementation details. At the same time you have to get to know your testing framework. Most popular and mature testing tools need quite some boilerplate and background knowledge. This adds to the already steep learning curve. As people learn best by tackling one step at a time, this is not ideal. It would be much better if we had an easy testing framework that juniors can use to learn the basics of testing without the extra overhead.

PHPT tests

Enter PHPT tests. Through work on some small bug fixes in the PHP core (and speaking about that) I learned about the PHPT test format. This is the test format used for testing the PHP interpreter itself. It is straightforward and simple to get into. The main idea is that a test contains a PHP script that prints output, and the output expected from that script. An elementary PHPT test would look something like this:

--TEST--
Basic arithmetic - addition
--FILE--
<?php var_dump(42 + 1); ?>
--EXPECT--
int(43)

Here, the --TEST-- section provides a short description about what the test aims to test. The --FILE-- section contains a script that prints some output. In this case we output the result of adding two integers. After --EXPECT-- comes the output we expect from the --FILE-- section.

Note that we use var_dump for outputting the result value. If we would use echo or print the output would be just 43. In that case we would not know whether that was the integer 43 or the string “43”. That makes us unable to verify that the result is of the correct type. Therefore we prefer var_dump: it shows the type of the value.

The PHPT test format is not feature-rich. Still it supports all the constructs necessary to write sensible tests. You can use them by adding these sections:

--EXPECTF--, --EXPECTREGEX--: Instead of specifying the exact desired output with --EXPECT--, one can specify a pattern for the expected output. This pattern can be provided either as a printf-like string or a regular expression.
--SKIP--: You can add this section to describe when a test should be skipped. This can be used to check for a specific platform or the presence of a required PHP extension.
--CLEAN--: When the test creates some temporary artifacts (like files on disk) the code in this section can clean them up.
--INI--, --ENV--: Sometimes a test needs to run with specific settings like php.ini directives or environment variables. You can specify those settings in these two blocks.

More information about the PHPT test structure can be found on the website of the PHP QA team.

Learning to test with PHPT

When learning to test, we can see the frugal feature set of PHPT as an advantage. It allows one to focus on learning the concepts of testing and testability without getting lost in features of a specific testing framework. These features are useful for developers that are already accustomed to testing. Still it is better to build understanding of how they work under the hood first. In fact, juniors will probably find a way to emulate them in a ‘naive’ way using concepts they already know:

Want to test that a function throws an exception? Use a try-catch. No need for expectException() yet.
Have multiple tests that look similar, but with different data? Put them in a for- or foreach-loop. This paves the way for a data provider later on.
Sharing some initialization logic between tests? Move it to a function. Later on such a function can become a setUp-method.
Need a test double? Create a dummy implementation within the test yourself. This helps to learn what a mocking library does under the hood. It also encourages to keep interfaces small and method chains short. Patterns for the different types of test doubles will emerge more naturally. We don’t have them all coming from createMock() anymore. This helps juniors to learn how e.g. a stub differs from a mock.

As a bonus, PHPT tests make it easy to write characterisation tests. Take any code fragment that exercises the system under test in some way and put it in the test. Then run it to get the current output and register that as the expected output. This strategy provides a great opportunity to learn how one can use characterisation tests as an aid to refactor legacy code. It also teaches how they are brittle and should eventually be replaced by tests that properly specify desired behavior.

Using PHPT tests with PHPUnit

One little-known feature of PHPUnit is that it actually has built-in support for PHPT tests out of the box. All we have to do to enable this is to add a directory with the .phpt suffix to the phpunit.xml configuration file:

<?xml version="1.0" encoding="UTF-8"?>
<phpunit bootstrap="vendor/autoload.php">
    <testsuites>
        <testsuite name="My Test Suite">
            <directory>test</directory>
            <!-- (line below added) -->
            <directory suffix=".phpt">test</directory>
        </testsuite>
    </testsuites>
    <!-- ... -->
</phpunit>

If we have done so, PHPUnit will also look for .phpt files in the test directory, and execute them as PHPT tests. With this configuration we can start mixing ‘normal’ PHPUnit tests and PHPT tests. Still we can run them as if there were no difference. We just support both test formats side-by-side without any extra infrastructure or tooling.

This means developers experienced with testing can still write their usual PHPUnit tests. Developers that have recently started out with testing now have a simpler way of writing tests though. Instead of the following test from the PHPUnit manual

<?php
use PHPUnit\Framework\TestCase;

class StackTest extends TestCase {
    public function testPushAndPop() {
        $stack = [];
        $this->assertSame(0, count($stack));

        array_push($stack, 'foo');
        $this->assertSame('foo', $stack[count($stack)-1]);
        $this->assertSame(1, count($stack));

        $this->assertSame('foo', array_pop($stack));
        $this->assertSame(0, count($stack));
    }
}

you can write this equivalent PHPT test:

--TEST--
Array as a stack
--FILE--
<?php
$stack = [];
var_dump(count($stack));

array_push($stack, 'foo');
var_dump($stack[count($stack)-1]);
var_dump(count($stack));

var_dump(array_pop($stack));
var_dump(count($stack)); 
?>
--EXPECT--
int(0)
string(3) "foo"
int(1)
string(3) "foo"
int(0)

Notice the differences in boilerplate/prerequisites to understand the two tests. For the PHPUnit test one needs to know about the PHPUnit TestCase class and have a basic understanding of OOP. Also you have to know that methods starting with test are invoked by the PHPUnit test runner. Lastly, you need to be able to pick the appropriate assertion method.

With the PHPT test juniors can take any existing snippet of PHP code that produces some output and turn it into an automated test. They might already use such a script for manual testing. Now they can convert it to a test that runs with the rest of the testsuite.

Limitations

Of course, most of us do not write tests using PHPT, but use a framework like PHPUnit, phpspec or atoum. There are good reasons for that. Using the PHPT format and running it with PHPUnit has some serious limitations:

Features. Most test frameworks have several helpful features built-in. These help to reduce the amount of code to write for certain tests or clarify why certain tests failed. Think of data providers and integration with mocking libraries. But there are also test dependencies, ways of expecting exceptions and specialized assertion methods. We have discussed that most of these can be ‘emulated’ in tests themselves. Still, having these in our test framework reduces the amount of boilerplate we need to write. It makes experienced testers much more productive.
IDE support. There is no real support for PHPT tests in most IDE’s. With proper file associations the code between <?php ?> tags is usually still highlighted. The PHPT sections will not be, though. Additionally, I don’t know of any IDE that can quickly run a single .phpt file as a PHPT test from the editor.
Unsupported sections. The PHPUnit test runner has only limited support for PHPT tests. Several PHPT sections are not supported by this runner, most of them related to simulating HTTP input.

Conclusion

I see no use in switching to PHPT tests if you are already experienced in writing tests. Still, using PHPT tests might be a nice way for less experienced developers to get started with testing. With little boilerplate and a small feature set, it helps them to focus on ways of testing and writing testable code. That is difficult enough in itself, and deserves their full attention. More advanced features and constructs will arise naturally and help to discover how a testing framework works under the hood.

Most important: in PHPUnit, PHPT tests can be enabled alongside ‘normal’ tests with just a single line of configuration. Such a strategy provides a smooth upgrade path from basic PHPT tests to a more mature testing framework. It does not even require additional tooling. Help your developers to start testing, enable PHPT tests!

Enjoyed reading this post?

My colleagues and I regularly write webdev-related blog posts like this to share the things we learn or discover. You can find all our posts on the Moxio blog. Get notified of new posts by following me or Moxio on Twitter or by subscribing to our RSS feed.

Moving individual MySQL tables on disk

Arnout Boks — Mon, 12 Feb 2018 23:00:00 +0000

You may encounter the situation where you want to move one single MySQL database table to another (location on) disk, e.g. to free up disk space. It turns out that this process is far from straightforward. In this post I will describe several of our failed approaches (since failures are a great opportunity for learning), and the solution we eventually came up with.

Why move tables on disk?

A while ago we received an alert that disk space on one of our servers at Moxio was gradually running low. We constantly monitor our servers for factors that could threaten normal operation, and receive a first alert when the free disk space drops below 30% of the total disk size. In that way we still have plenty of time to act upon it before the disk is actually full.

Graph of free disk space before moving tables

Usually in such situations we could free up sufficient disk space by deleting old backups and temporary files, or moving certain file storage directories off the main hard drive to an external storage array. This time almost all disk space was taken by the MySQL database server however. Some tables had been growing steadily over time, eating up more and more disk space. Since this was data we had to keep, our only sustainable solution was to move these tables to another disk.

The desired situation

Our ideal solution would be to move only the large individual tables to the external storage array. With that approach we could keep the other tables in the database on the (much faster) SSD disk, not sacrificing performance unnecessarily. So is this even possible with MySQL?

Historically, MySQL used to store all table data and indices in the system tablespace, represented by one or more ibdata files on disk. This means that data from multiple databases and tables was stored in the same file, making it impossible to move one of them to another location. Then MySQL 4.1.1 introduced file-per-table tablespaces for InnoDB with the innodb_file_per_table setting, which would store data and indices for newly created tables in a separate .ibd file per table. This setting became enabled by default in MySQL 5.6.6.

When using file-per-table tablespaces, it is possible to use the DATA DIRECTORY = 'path' clause with CREATE TABLE to place the data for the table outside the main MySQL data directory as of MySQL 5.6. This means that a setup as desired would be technically possible.

Failed approaches

In our case we were dealing with existing tables however. We tried several unsuccessful approaches to move these to another directory before eventually finding a satisfying solution.

Changing the data directory after creation

Since a DATA DIRECTORY can be specified to change a table’s storage location when creating the table, it seems logical that it would also be possible to change this location after creation by using this same option with ALTER TABLE. Indeed, ALTER TABLE syntactically supports DATA DIRECTORY as a table option. The documentation explicitly states however that this option is ignored (except when partitioning). It thus seems impossible to change DATA DIRECTORY for an existing table.

Symlinking the data files

Another approach we came up with was manually moving the table’s data files to the other disk, symlinking them back to their original location inside MySQL’s main data directory. According to the documentation however, although MySQL supports symlinking entire database directories or individual MyISAM tables, using symbolic links to InnoDB tables is not supported and may cause strange problems.

Creating a copy in the desired location

Since DATA DIRECTORY can only be specified when creating a table, we also tried creating a copy of the table in the desired location, copying over all data from the old table to the new copy and renaming the new table to take the place of the old one. This would look somewhat like:

SHOW CREATE TABLE `table_name`;

This returns the CREATE TABLE statement that would create the table structure of table_name. We now execute that exact statement, but substitute a new table name and append a DATA DIRECTORY clause:

CREATE TABLE `table_name_new` /* ... */ DATA DIRECTORY='/path/to/desired/location';

Now we have an empty copy of table_name, in schema that is. We then copy all data over from the existing table to the new one and issue an atomic rename to swap the two tables:

INSERT INTO `table_name_new` SELECT * FROM `table_name`; RENAME TABLE `table_name` TO `table_name_old`, `table_name_new` TO `table_name`;

Here we ran into problems however. When running the INSERT INTO ... SELECT query, MySQL failed with the error message

ERROR 1206 (HY000): The total number of locks exceeds the lock table size

It turns out that when using INSERT INTO ... SELECT, MySQL needs to set locks on the rows read from the source table to ensure proper replication. These locks take up space in the InnoDB buffer pool. The size of this pool is configurable using innodb_buffer_pool_size, but set to 128 MB by default. Since the table we’re trying to move is quite large (more than 50 million rows), MySQL is bound to run out of its space for holding locks.

Even if we could sufficiently increase innodb_buffer_pool_size to make this work, this approach is quite inefficient once you think about it. When inserting data into the new copy, MySQL has to serialize the data to disk and build indices for it, which takes a lot of time. We already have the serialized data and indices however: they’re right there for the original table! Instead of letting MySQL recreate the entire index and data file from scratch, we should look for a way to re-use the table data that is already there.

The solution

Eventually we found a solution for smoothly moving tables on disk, based on a guide for copying tablespaces to another MySQL instance in the MySQL manual. The solution comes down to moving the existing tablespace on disk, dropping and re-creating the table with the desired DATA DIRECTORY, and then re-attaching the saved tablespace to the new table.

Step-by-step, this looks as follows. First we ensure that all data is flushed from MySQL’s caches and buffers to the tablespace on disk:

FLUSH TABLES `table_name` FOR EXPORT;

FLUSH TABLES also locks the table for the duration of the connection or until we unlock it. The lock ensures the data in the table cannot change while we are moving the files. To maintain these locks we should keep open the MySQL connection in which we ran FLUSH TABLES .... In a new terminal window we move the tablespace files to a temporary location. In this case we use our home directory:

$ mv /var/lib/mysql/database_name/table_name.{ibd,cfg} ~

Now returning to our open MySQL session, we can release the locks (the tablespace has been safely put away in a consistent state), drop and re-create the table in its desired location:

UNLOCK TABLES; SHOW CREATE TABLE `table_name`; DROP TABLE `table_name`; CREATE TABLE `table_name` /* ... */ DATA DIRECTORY='/path/to/desired/location';

This will also create fresh tablespace files on disk (in the desired location) for the newly created table. We do not want these (because we want to put back our old tablespace files), so we discard this new tablespace:

ALTER TABLE `table_name` DISCARD TABLESPACE;

At this moment we can copy the saved original tablespace files to the location where MySQL now expects them, making sure to preserve ownership and permissions:

$ cp -a ~/table_name.{ibd,cfg} /path/to/desired/location

The final step is to let MySQL import the original tablespace files back from disk:

ALTER TABLE `table_name` IMPORT TABLESPACE;

Except for copying the tablespace files to the other disk, this whole process is very fast, as MySQL just takes the original table data and indices and does not have to rebuild these.

Recap

After several failed attempts (in which we did learn a lot about MySQL internals though), we eventually found a solution for efficiently moving tables to another location on disk. Using this approach we moved some of our largest and fastest-growing tables to an external storage array, freeing up the diskspace necessary to keep our server going.

Graph of free disk space after moving tables

Another thing to remember: if you know that a database table will become large, plan its location on disk in advance. When creating the table, specifying a custom location on disk (using DATA DIRECTORY) is simple and saves you the hassle of the process described in this post down the road.

Enjoyed reading this post?

PHP Central Europe conference 2017

Arnout Boks — Mon, 18 Dec 2017 19:36:44 +0000

Early November I attended the first edition of phpCE, a new PHP community conference in Central Europe, originating from a merger between PHPCon Poland and Brno PHP conference. In this blog post I would like to share some of my experiences and things I have learned during that event, in terms of interesting content, delivering two talks myself, and interactions with the community.

Content

A conference is nothing without great content, and phpCE surely lived up to its expectations. Of course I cannot describe everything I've learned in a single blog post, but there were certainly some personal highlights I'd like to share.

Andreas Heigl kicked off the conference with his keynote 'How to get the most out of a tech conference!', containing some practical tips for both first-time as well as seasoned conference visitors. Not only did he give advice for attending talks, but also for social interactions in the hallway track (including the Pac-Man rule).

Sebastian Bergmann presented 'Domain-Specific Assertions', about how using the ubiqitous language of the domain in PHPUnit assertions can help to make them more understandable to your coworkers, your future self, and even non-developers. His talk wasn't actually limited to assertions, making a plea for understandable programs (and against the term 'code') in general.

In 'The GDPR is coming, are you ready?', Michelangelo van Dam talked about the General Data Protection Regulation from the EU, which will be effective as of May 25, 2018. As this topic had not received that much attention in the Netherlands yet (as far as I know), for me this talk really was an eye-opener to start reading up on it and taking measures. Michelangelo's talk already contained a lot of practical hints for implementing compliant systems.

Another really interesting session was Nikola Poša talking about best practices for exception handling in 'Journey through "unhappy path"'. I actually wasn't able to attend this talk myself (due to speaking at the same time), but I heard a lot of really positive reactions and talked a bit with Nikola about this subject afterwards. We really value well-designed exception handling at Moxio (especially with the research Tom has done in his thesis project), so it's nice to see this topic getting some well-deserved attention at conferences.

Microservices and event-driven architectures were popular topics at phpCE, with both Mariusz Gil ('Modeling complex processes and time with Saga pattern') and Christopher Riley ('Microservices vs The Distributed Monolith') speaking about this subject. They both described how microservices done right require an asynchrononous event-based approach, and how failure handling in such an architecture (rather than trying distributed, long-lived transactions) means embracing eventual consistency. The Saga pattern can be seen as a recipe for failure handling, describing actions to take for failure events at different steps in the process.

Speaking

For me, the main reason for attending phpCE 2017 was as a speaker, having been invited to present my talk 'Getting started with PHP core development' (slides). In this talk I described my own journey to my first contribution to the PHP programming language itself, from encountering a bug to writing a test, fixing the C source code and patching the documentation. With this talk I wanted to show, based on my own experiences, how any PHP programmer can contribute something back to the PHP project, even without any experience with the PHP core and/or the C programming language in which it is written. Based on the reactions I heard, I hope that some of the people attending this talk will have made their first contribution to PHP by now.

Eventually I ended up doing a second session, filling in for another speaker who had to cancel. In this vacant slot I presented 'Introduction to the Semantic Web' (slides), a talk I did earlier at DPC17. It aims to show the audience the strengths and limitations of RDF, OWL and the other Semantic Web standards from W3C, which open up possibilities for a web of linked data that can be consumed by smart agents. After the session I got some really interesting questions about representing non-factual data in RDF, which can be done by a technique called reification. I will definitely include this topic in an updated version of my talk.

Community

Although phpCE had a lot of interesting content, good content can also be found in a lot of other places: in books, videos and blog posts. What really sets a great conference apart is the interactions with fellow community members. From that viewpoint, the remoteness of the conference venue (in Ossa, about an hour's drive from Warsaw) was actually ideal. While at other conferences quite some attendees leave the venue after the day programme (to spend the night at home or at a hotel downtown), at phpCE almost all delegates stayed in the Ossa hotel. This gave many opportunities for great discussions and socializing with fellow developers before and after the main conference programme. I got to meet new friends and gained many interesting insights during the meals, late night drinks in the hotel bar, and the newly discovered sport off-by-one bowling.

A disadvantage of a remote location can be that there isn't actually much to see around there. To make up for that, the phpCE team organised an opening day for speakers in Warsaw the day before the conference. We spent the day with a tour guide and an old bus from the communist time, walking and driving around the city center. It was a great occasion to see some of the cultural and historical highlights of Warsaw, learn a bit or two about Poland, and meet fellow speakers before the start of the conference.

To conclude

I really enjoyed attending phpCE 2017. The conference was packed with great content from which I learned a lot, I met many really nice people, and just generally had a great time. I can highly recommend to attend the next edition of this conference, which will be in Prague in the fall of 2018.

This post was originally published on the Moxio blog.

On type safety without generics, and the role of package design

Arnout Boks — Fri, 02 Jun 2017 11:23:24 +0000

Despite recent discussions in the PHP community about whether type hints are to be considered 'visual debt' or not, at Moxio we still strongly value adding types to our code. Writing type-safe code lets us catch bugs early, enables static analysis, and serves a self-documenting purpose. Still it can be a challenge to write type-safe code in PHP, especially as it lacks a feature known as 'generics'. In this blog post I will show how (lack of) generics influences type-safe design, how parameter types and return types may change when extending a class or interface, and how we can keep our package design sound while doing so.

Extension and return type hints

Suppose we have an interface that represents a file, from which we can get the raw contents. Instances of this interface are created by a file reader, which accepts a filepath and returns an object corresponding to that file on disk:

<?php
interface File {
    public function getContents(): string;
}
interface FileReader {
    public function readFile(string $filepath): File;
}

These interfaces (and their implementations) may be part of our own code, or they could be defined by some vendor package. Either way, we would like to extend File with some functionality specific for PHP files, like retrieving its PHP Abstract Syntax Tree (AST):

<?php
interface PhpAst {
    /* (omitted) */
}
interface PhpFile extends File {
    public function getAst(): PhpAst;
}

Now how does this relate to our FileReader interface? What should the interface for a class that reads and returns PHP files look like? Can we use such a PHP file reader in places where a FileReader is expected?

With generics

Languages that have generics, like Java, allow an elegant solution to this problem. We can just parameterize the FileReader interface with the type of file returned:

interface PhpAst {}
interface File {
    public String getContents();
}
interface PhpFile extends File {
    public PhpAst getAst();
}

interface FileReader<T> {
    public T readFile(String filepath);
}

Now a reader that returns instances of PhpFile implements FileReader<PhpFile>, while readers that (are only guaranteed to) return more generic File objects are a FileReader<File>. If some code needs just any (generic) file reader it can just declare it as FileReader<? extends File>. It can then accept both FileReader<File> and FileReader<PhpFile> (or any other reader that returns a subtype of File), having the guarantee that the objects returned from it are always a File and thus at least have a getContents() method.

Without generics

In languages without generics, like PHP, this is a bit more difficult. We could of course just create a separate interface for a PHP file reader:

<?php
interface PhpFileReader {
    public function readFile(string $filepath): PhpFile;
}

However, can we declare such an interface as an extension of FileReader, and thus pass it wherever a FileReader is required, despite the signature of readFile not being exactly equal? In (type) theory, the answer would be 'yes'. After all, a FileReader is something obeying the contract that we can call readFile() on with a filename to receive a File, in which a File is an object guaranteed to have a getContents() method returning a string. Our PhpFileReader::readFile() method returns a PhpFile, which is (a particular specialization of) a File. This makes a PhpFileReader obey the contract of a FileReader.

This type-theoretical principle is called covariance: when extending a type (class or interface), we are allowed to 'tighten' the types of the values returned by each method (to a more specific subtype) without breaking its contract. Consumers calling such a method when they expect to deal with the parent type will still get an object of the type they desire, having all the methods they expect to find on that object. Subtypes may thus be stricter in what they return than their parent types.

Does this mean we can just write interface PhpFileReader extends FileReader? The practical answer is, unfortunately, 'no'. PHP only supports covariance when extending or implementing classes or interfaces in the specific situation where the parent type has no return type hint on the method and the subtype adds one. We can view this as tightening the return type from literally anything to a specific type. In all other situations, it requires any return type hints to be identical between the parent and the child. This does not allow us to use our PhpFileReader where a FileReader is required without removing type hints, making our code less type-safe. I will show a solution to this problem in a while. First, let's look at how subtypes affect the input parameters of methods.

Extension and parameter type hints

Suppose that in our system we also have rules operating on files, e.g. for checking coding standards or detecting bugs. We could model such a rule on a generic file using a type-safe interface like this:

<?php
interface FileRule {
    public function check(File $file): bool;
}

Rules implementing this interface can retrieve the contents of the file they receive using getContents(), and then for example check for "@todo" markers or trailing whitespace. But what if we want to write rules that are specific to PHP files, using the more powerful information available inside the AST?

With generics

In languages with generics we have an easy solution: we parameterize the interface again, this time by the type of the file argument we intend to check:

interface FileRule<T> {
    public boolean check(T file);
}

A rule that operates on PHP files then implements FileRule<PhpFile>, can only be called with instances of PhpFile and has access to the getAst() method on the file object. A rule that checks any type of file can implement FileRule<File> and will accept any File instance, but will not know about methods like getAst(), which are specific to the PhpFile subtype.

If part of our code needs to accept (e.g. as a parameter) a rule that can check a PHP file, this must include instances of FileRule<File>. After all, since a PhpFile is just a specific example of a File, it can be given in any place where a File is expected. This means that a FileRule<File> can also check a PhpFile, and thus is a valid replacement for a FileRule<PhpFile>. To allow the rule parameter to accept both a FileRule<PhpFile> and a FileRule<File> we can declare it as a FileRule<? super PhpFile>, i.e. "any rule that will check a PhpFile".

Without generics

In PHP we cannot resort to the solution we would have used in a language that does support generics. Also for this situation we can create a separate interface:

<?php
interface PhpFileRule {
    public function check(PhpFile $file): bool;
}

Again we can ask what the relationship between FileRule and PhpFileRule is. Is PhpFileRule an extension (subtype) of FileRule? We can see fairly easily that it isn't. After all, a FileRule is something that can check any File object, while PhpFileRule will not accept (for example) a PythonFile. This violates the Liskov Substitution Principle and establishes that PhpFileRule is not a proper subtype of FileRule.

What most people find confusing and suprising is that the (type-theoretical) relationship between FileRule and PhpFileRule is actually the other way around: FileRule is a subtype of PhpFileRule! It makes sense though, given that FileRule will accept anything that a PhpFileRule accepts as a parameter (i.e. all instances of PhpFile, which are all a File by inheritance). This type-theoretical principle is called contravariance: when extending a type we are allowed to loosen the types of all input parameters of methods in the subtype. Subtypes thus may be more liberal in what they accept than their parent types.

Does that mean that in practice we could (and should) write interface FileRule extends PhpFileRule in PHP? The answer is (two times) 'no'. First, PHP does not support contravariance of method parameters when extending or implementing a class or interface. This means that, for now, the type hints on method parameters need to be identical between parent type and subtype. An RFC which will allow one specific type of contravariance (completely dropping the type constraint from a parameter, allowing any input value) has been accepted for PHP 7.2. This RFC is mainly intended for library authors to start adding scalar type hints to interfaces without breaking subclasses. Full covariance and contravariance support in PHP is not expected in the short term, as their implementation requires changes to autoloading.

Package Dependency Principles

Still, even if PHP supported it, it would not be a good idea to declare that FileRule extends PhpFileRule. This has to do with the principles of package design as described by Robert "Uncle Bob" Martin. If we think of the functionality and interfaces for generic files as being in one package and the ones for PHP files belonging to another, the 'php files' package has a dependency on the 'generic files' package, as PhpFile extends File. If we would let FileRule extend PhpFileRule we would also introduce a dependency the other way around, creating a package dependency cycle and thus violating the Acyclic Dependencies Principle.

The Stable Dependencies Principle ("Depend in the direction of stability") and the Stable Abstractions Principle ("Abstractness increases with stability") point us in the 'correct' direction of dependence. Since the 'generic files' package is more abstract than the 'php files' package (and thus more stable), 'php files' can depend on 'generic files'. Dependencies the other way around should be avoided. We can even imagine that the 'generic files' package is a third party library that we use. We are in no position to ask that library's author to make his FileRule interface extend our PhpFileRule interface.

Adapter pattern to the rescue!

Then how are we supposed to use a FileRule where a PhpFileRule is expected, or pass a PhpFileReader to a method that wants a FileReader? It turns out we can solve all our problems with a simple Adapter design pattern. We create two small classes. One adapts FileRule to the interface of PhpFileRule, and the other adapts PhpFileReader to a FileReader:

<?php
class FileRuleAsPhpFileRule implements PhpFileRule {
    private $file_rule;

    public function __construct(FileRule $file_rule) {
        $this->file_rule = $file_rule;
    }

    public function check(PhpFile $file): bool {
        return $this->file_rule->check($file);    
    }
}

<?php
class PhpFileReaderAsFileReader implements FileReader {
    private $php_file_reader;

    public function __construct(PhpFileReader $php_file_reader) {
        $this->php_file_reader = $php_file_reader;
    }

    public function readFile(string $filepath): File {
        return $this->php_file_reader->readFile($filepath);
    }
}

If we now want to use a FileRule where a PhpFileRule is asked for we can just pass new FileRuleAsPhpFileRule($file_rule). Similarly we can use new PhpFileReaderAsFileReader($php_file_reader) to have a reader for PHP files act as a generic file reader. This allows us to adapt between our related interfaces in a type-safe way and, since the adapters are both part of the 'php files' package, prevents dependency cycles.

Of course there is a small downside to this approach. Using this pattern requires two additional adapter classes and two PHP-file-specific interfaces to be written (when compared to the implementation with generics) and the delegation calls inside the adapters incur a very small runtime overhead. We consider these disadvantages negligible however when compared to the benefits of type safety and proper package dependencies.

Summary

When extending a type, the subtype may widen the types of input parameters (contravariance) and narrow down the types of return values (covariance). In other words, the subtype may be more liberal in what it accepts and more strict in what it returns.

In languages without generics, type-safe programming may require creating separate interfaces that are covariant or contravariant.
PHP only allows covariance and contravariance in a very limited set of cases.
Adding an explicit extends clause for contravariance situations may break package design principles.
An Adapter pattern is an easy solution to overcome these situations.

This post was originally published on the Moxio blog.

Review Roulette: Everyone is a winner!

Arnout Boks — Thu, 30 Mar 2017 22:00:00 +0000

In an earlier blog post I introduced our idea of Review Roulette, a process of randomized code reviews with the aim to foster learning and increase collective code ownership. I explained that we would try this idea out as an experiment for two months and evaluate afterwards. I also promised to share the results of that evaluation. In this post I will do so, and describe the steps we took to make Review Roulette work even better for us.

Evaluate…

For evaluation we asked all participants to fill out a short questionnaire with questions from three different viewpoints:

Experiences as reviewer
Experiences as reviewee
General impressions

The questions focused on the (perceived) usefulness of Review Roulette as a means of finding defects, improving internal quality and sharing knowledge. Additionally we asked about some aspects we identified as potential obstacles for a well-functioning process, like time consumption and communicational aspects.

From the results we can conclude that, overall, Review Roulette received really positive feedback. 100% of the participants agreed that we should continue the experiment as a structural process, and the statement “I think that doing Review Roulette is useful” received a score of 4.75 out of 5 on average. For us, this was sufficient to decide to continue doing Review Roulette, which we still do up to this day. Based on the feedback, we made some small adaptations, though.

…and adjust

In the questionnaire results we noticed that the statement “The commits I got assigned were useful to review” received an average score of only 3.38 out of 5, with 63% of participants scoring it a 3 or lower. This was mainly caused by commits with trivial one-line textual changes being randomly selected for review. To improve on this, we added a filter that only selects commits with at least 3 changed lines in relevant source files (php, js, css, json, etc.) as eligible for review. Implementation of such a filter was fairly straightforward using diffstat to analyze the commits. This should weed out most trivial changes, leaving only the more interesting commits for review. As a side-effect, this also fixes the issue where an import of a binary file was put up for review.

Another obstacle that we identified was the time needed for doing Review Roulette. Although we agreed a time limit of one hour per week when starting the experiment, the statement “I was able to find sufficient time to review the commits assigned to me” only scored 3 out of 5 on average, with 63% of participants scoring it a 3 or lower. This was mainly influenced by our relatively large number of student employees, who work 1 or 2 days a week besides their studies. While one hour a week may be almost negligible on a full-time work week, it is a significant time investment when your work week is only 8–16 hours. Based on this feedback, we introduced a bi-weekly schedule for part-timers (only assigning them a review once per two weeks), while keeping the full-time employees on the original weekly schedule. This ensures that we can still benefit from bi-directional knowledge sharing with our student employees, while limiting the amount of time they need to spend on it.

One last observation

An interesting observation from the questionnaire results was that the perceived usefulness of Review Roulette was higher from the perspective as a reviewee than from the perspective as a reviewer, both in terms of bugs found (3.38 vs. 2.63 out of 5) and improved design (4.50 vs. 3.13 out of 5). The lesson to draw from this is that we must not underestimate the value of the feedback that we give to others. Things that may seem trivial or charted territory to ourselves may be entirely new to colleagues. This is exactly the type of knowledge sharing that we aim to amplify using Review Roulette.

Conclusion

This experiment and evaluation bears a lot of similarities with agile software development: it is often useful to start with a small and simple experiment, then evaluate and adjust. We have been using Review Roulette with the mentioned improvements for about 4 months now, and are very happy with the added value it brings.

Originally published at www.moxio.com on March 30, 2017.

Detecting hidden bugs in PHP code using PHP_CodeSniffer

Arnout Boks — Thu, 01 Dec 2016 08:00:00 +0000

Although PHP serves us well as a programming language, we cannot deny that some of its behavior can be very surprising. If one is not aware of these pitfalls, this can easily lead to hidden bugs in PHP code. In fact, we have ran into a fair share of these issues ourselves, but try to take measures to prevent being struck by them again. In this post we will look into two potential pitfalls in PHP, and how these can be detected and avoided using our open-sourced collection of sniffs for PHP_CodeSniffer.

Fool me once, shame on you

Quick, what does the following code do?

<?php 
$chars = str_split("The quick brown fox jumps over the lazy dog."); foreach ($chars as $char) {
    switch (strtoupper($char)) {
        case "A": 
        case "E": 
        case "I": 
        case "O": 
        case "U": 
        case "Y": 
            continue; 
    }

print $char; 
}

It’s not strange to think that this code fragment will print the input string with all vowels removed. However, it will in fact just print the original string, including the vowels (see it for yourself). This behavior is caused by the fact that PHP considers a switch-statement a looping structure for the purpose of continue. The continue thus jumps to the end of the switch-statement (rather than to the end of the foreach-body) and all vowels are still printed. If we want to skip printing the vowels in the example above, we would have to use continue 2 to jump to the end of the foreach-body.

I recently bumped into this bug feature interesting behavior in a piece of code. Even though I had seen an example similar to the one above before, and thus should have known this intricacy, I still spent way too much time trying to find out why my code did not produce the desired results.

Fool me twice, shame on me

To prevent getting bitten by this quirk again, we decided to implement something for automated detection of the above situation and warn us about its unexpected behavior. For this purpose we chose to write a custom sniff for PHP_CodeSniffer, which we already used for enforcing coding standards. The token- and scope-based approach used by PHP_CodeSniffer makes it easy to check all switch-cases for top-level continue-statements (i.e. not within a nested looping structure) and see if they have a numeric 'level'-argument. We disallow any such continue-statements without an explicit number of looping levels to jump over:

<?php
for ($i = 1; $i < 10; $i++) {
    switch ($x) {
        case "foo": 
            continue; // NOT OK, probably a bug 
        case "bar": 
            foreach ($a as $k => $v) {
                continue; // OK, inside a nested 'foreach' 
            }
            continue 2; // OK, explicit 'level'-argument 
    } 
}

This warns us about potential hidden bugs like the one above. If we get an error from PHP_CodeSniffer but the behavior of continue is actually what we want (although I cannot imagine why one would not use break in such a case), we can always replace the continue by continue 1 to confirm that we have thought this case through and suppress the error.

Fool me three times, …

If only this was the sole hidden pitfall when working with PHP… Due to all problems that come with non-strict comparisons we already have a check in place to enforce strict comparison operators (=== etc.) over their non-strict counterparts (== etc.). However, there are still some PHP functions that will surprise you with non-strict comparison behavior by default, like in_array and array_search.

That’s why we have also implemented a PHP_CodeSniffer sniff that requires the $strict-parameter to such functions to be set explicitly:

<?php 
in_array("foo", [0]); // NOT OK, might introduce hidden bugs in_array("foo", [0], true); // OK, strict comparison
in_array("foo", [0], false); // OK, you have probably thought about this

Although we can still opt in to the non-strict behavior, this should prevent us from having to deal with the intricacies of non-strict comparison if we have not explicitly asked for them.

… release as open-source

To help the rest of the PHP community avoid these pitfalls, we have decided to open-source our PHP_CodeSniffer sniffs. They can easily be added as a development dependency into other projects using the Composer package, either as a standalone ruleset or by integrating them into your own PHP_CodeSniffer standard.

We have a backlog of more PHP pitfalls that we want to implement checks for, so stay tuned (by following us on Twitter or subscribing to our RSS feed) for more sniffs!

Originally published at www.moxio.com.

Introducing Review Roulette

Arnout Boks — Tue, 20 Sep 2016 07:00:00 +0000

At Moxio we recently started an experiment we called ‘Review Roulette’: a process of randomized code reviews. We believe this emphasizes code reviews as a means of bidirectional learning and helps onboarding junior developers, and thus improves upon our previous review ‘policy’. In this post I would like to sketch the background behind this experiment, explain the idea of Review Roulette and present some preliminary results.

Our old situation

We already did a fair share of code reviews before the introduction of Review Roulette. About four years ago we set up a ReviewBoard server on our local network and started using it to discuss and review code. Review would be done in the tool, taking the discussion offline if desired. So far we did not have a formal policy around what and how to review: putting changes up for review was opt-in and the author chose the reviewer(s) and the timing (pre- or post-commit). Sometimes a review request was just a rough idea for discussion; at other times there would be a full implementation to verify.

This approach made it easy to start doing code reviews and ensured that their introduction was more or less resistance-free. It also kept the review burden and bureaucratic overhead low, making it easy to do small fixes and improvements (in the spirit of the Boy Scout Rule) without having to go through a formal review process.

However, such a lax structure also had its disadvantages. In practice, opt-in code reviews meant that most subjects posted for review were the more difficult or ‘controversial’ changes, that were bound to spark a fair amount of discussion (which they regularly did). This gave code reviews too much an air of cumbersome discussions and hard decisions. Moreover, changes that the author thought to be straightforward were usually not reviewed, even though someone else on the team may have spotted some overlooked difficult case. Also, a free choice of reviewers caused most reviews to be assigned to a fixed group of 2–5 more senior developers, creating a sort of unofficial hierarchy of reviewers.

Why do code reviews?

In order to understand the undesirable effects of this situation, let’s take a step backward and look at the reasons to do code reviews. The most obvious motivation is to find and correct bugs. As Steve McConnell notes in his book Code Complete, research has shown code reviews to achieve a 45%-70% defect detection rate, more than any type of automated testing. A second reason is that code reviews can improve design and thus make code more maintainable.

It is not only the code that benefits from reviews. When done properly, code reviews can also make team members learn from eachother and let them become familiar with parts of the code base that they do not work on often. Note that this works both ways: just like the original author can benefit from the feedback given by the reviewer, the reviewer can learn from the way the author has approached the problem at hand and discover methods and classes they were unfamiliar with. This all fosters shared code ownership and continuous learning, and makes code reviews a great way to onboard new or junior developers.

Our old review ‘policy’ missed some opportunities regarding this aspect of knowledge sharing. The de facto review hierarchy meant that juniors did not often get the opportunity to review (and thus learn from) code written by more senior developers. To emphasize code reviews as a means of bidirectional learning (as opposed to verification and supervision) requires a sense of reciprocity. Not reviewing changes that the author considered straightforward also missed out on opportunities for knowledge exchange, as a reviewer may see improvements that the author did not consider or (conversely) learn something new from the authors approach. Although we think that reviewing all changes would consume too much time and work aversely, we decided that all changes should be eligible for review.

Review Roulette

The considerations above brought us to the idea of Review Roulette:

Every week, every developer is assigned one random commit (from the previous week) written by a random colleague to review, to the extent possible.

The randomness aspect eliminates any kind of fixed reviewer hierarchy and ensures reciprocity, while also ensuring that every commit is eligible for review, precluding any assumptions about what other team members could learn or teach about a given subject. Another essential aspect is the clause “to the extent possible”. With Review Roulette it is unavoidable that sooner or later someone will be assigned to review a difficult commit in an unfamiliar part of the code base. While that person will probably not understand everything, and will not be able to review the change as thorough as someone else would, they can usually still give useful remarks on certain parts of it, point out low-level logic errors and suggest changes to naming and documentation that would make the code easier to understand for an ‘outsider’. Just let them indicate what they could and could not review. A fresh pair of eyes often gives very useful insights that someone knee-deep in that codebase would not notice. This provides circumstances in which every developer can participate: if you write code, you’re in.

We agreed to give this approach a try as an experiment, adding to (not replacing) the code reviews that we already did. To keep a limit on the time invested, especially for larger commits assigned to a reviewer not familiar in that area, we decided to not spend more than an hour per week on our Review Roulette. In a couple of hours I put together a simple script to retrieve all commits from the week before, filter out things like merge commits, generate a random assignment of reviewers to commits and post the review requests to our ReviewBoard server. We run the script every monday and do the reviews during the following week. Evaluation was planned two months into the experiment, when we will decide whether we continue Review Roulette and if any adjustments are necessary.

Preliminary results

At the time of writing, we have been trying out Review Roulette for 6 weeks, so there is no real evaluation yet (I will blog about that later — UPDATE : I have). Still, there are some personal observations that I would already like to share at this moment:

The vibe around Review Roulette seems to be mostly positive. People are enthousiastic about doing code reviews and sharing views about development.
Most changes, even the ones that seem fairly trivial at first sight, turn out to have something interesting in them if you (are forced to) look closely enough. Even in reviews of simple one-line commits I have seen useful discussions about the surrounding code, UX matters or testing styles arise. It seems that team members really see the reviews as an opportunity to learn and teach and try to get the best out of this interaction.
Some sort of filter on the list of commits eligible for review is still necessary. I have seen an occasion where someone was asked to review a new import of a compiled binary file. Oops!
For myself, I noticed that doing Review Roulette made me pay more care to the commit messages I write. I often found myself thinking “If this commit would be chosen for review, would my colleague understand it?” and deciding that some more context would be useful.

Overall, I would highly recommend to try out Review Roulette with your team! It is a great chance to deepen everyones knowledge of the codebase and let both the code and the developers benefit from bidirectional learning.

Originally published at www.moxio.com.

DEV Community: Moxio

How to load an SQLite extension in PDO?

The problem

Exploring solutions

Putting it all together

Wrapping up

What Open Source libraries would you like to have?

Ignoring bulk change commits with git blame

Putting changes into context

The problem: bulk changes

Git 2.23 to the rescue!

Summary

Mutation testing in PHP

Moxio Academy

Mutation testing

Lessons learned

Mutation testing is awesome!

‘False positives’

Best practices for PHP exception handling

Arnout Boks for Moxio ・ Jan 10 '19

What test to add?

It improves your code too

Conclusion

Best practices for PHP exception handling

Types of exceptions

Exceptions as part of the function signature

Creating exception subclasses

Catching exceptions

Special case: debug info

Throwing a new exception after catching

Translation to the correct level of abstraction

From RuntimeException to LogicException

A grey area

Syntax error in a query

Cache item not found

Corrupt data in the database

Recap

Start testing with PHPT tests in PHPUnit

PHPT tests

Learning to test with PHPT

Using PHPT tests with PHPUnit

Limitations

Conclusion

Enjoyed reading this post?

Moving individual MySQL tables on disk

Why move tables on disk?

The desired situation

Failed approaches

Changing the data directory after creation

Symlinking the data files

Creating a copy in the desired location

The solution

Recap

Enjoyed reading this post?

PHP Central Europe conference 2017

Content

Speaking

Community

To conclude

On type safety without generics, and the role of package design

Extension and return type hints

With generics

Without generics

Extension and parameter type hints

With generics

Without generics

Package Dependency Principles

Adapter pattern to the rescue!

Summary

Review Roulette: Everyone is a winner!

Evaluate…

…and adjust

One last observation

Conclusion

Detecting hidden bugs in PHP code using PHP_CodeSniffer

Fool me once, shame on you

Fool me twice, shame on me

Fool me three times, …

… release as open-source

Introducing Review Roulette

From `RuntimeException` to `LogicException`