goaty92

Posted on Mar 18, 2022

In response to "Yes, PHP Is Faster Than C#"

#csharp #php #performance

Recently there is a blog post titled Yes, PHP Is Faster Than C# that has sparked quite a conversation. I decided to run the tests mentioned in the post and found some interesting result, which I think is worth sharing.

The benchmark used here reads a file from the file system in 4 KiB chunks, and count the number of bytes with the value 1 in the file. First off, I would start by saying that I don't find this "benchmark" to be very meaningful, especially since reading files from disk is involved. There are a lot of things that can impact the file-system performance (caches, state of the disk drive, how busy the kernel is at that time), none of which is address in the test itself.
Nonetheless, the results do indicate some interesting performance characteristics that we can talk about.

Source code for the test can be found here: https://github.com/dhhoang/csharp-php-file-read

Small files

I generated the test file like this

# for this test, we will use file_size of 4 MiB as specified in the original post
base64 /dev/urandom | head -c [file_size] > test.txt

The code for the PHP (8.0) program looks something like this:

function test()
{
    $file = fopen("/path/to/test.txt", 'r');
    $counter = 0;
    $timer = microtime(true);
    while ( ! feof($file)) {
        $buffer = fgets($file, 4096);
        $counter += substr_count($buffer, '1');
    }
    $timer = microtime(true) - $timer;
    fclose($file);
    printf("counted %s 1s in %s milliseconds\n", number_format($counter), number_format($timer * 1000, 4));
}
test();

And for C#:

private static void Test()
{
    using var file = File.OpenRead("/path/to/test.txt");
    var counter = 0;
    var buffer = new byte[4096];
    var numRead = 0;
    var sw = Stopwatch.StartNew();
    while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
    {
        counter += buffer.Take(numRead).Count((x) => x == '1');
    }
    sw.Stop();
    Console.WriteLine($"Counted {counter} 1s in {sw.ElapsedMilliseconds} milliseconds");
}
Test();

The result when running on a t3-xlarge EC2 instance is as follows (note: code is run 10 times and runtime is averaged after removing anomalies due to cold file cache)

Test-C#      53.2ms
Test-PHP     11.1ms

So the PHP code is about 5 times faster than the C# code!!! So looks like PHP really is faster than C#?

Something is definitely off here. Is .NET that slow when reading a file? Probably not. I did a simple test where I removed the "counting" part in both programs, and their performance became very similar. The blog's author claimed that the test has "very little user-land code" and mainly test the file-reading performance. I found this to be incorrect.

Now if you look closer at the 2 programs, they are very similar, except for the part where the 1 bytes are counted. PHP uses the substr_count built-in function which is very optimized, while the C# code uses LINQ. LINQ is a very convenient way to work with collections in C#, but they are also quite slow. What if we try to just count the bytes the old-fashioned way?

private static void Test_FileStream_NoLinq()
{
...
    while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
    {
        for (var c = 0; c < numRead; c++)
        {
            if (buffer[c] == '1')
            {
                counter++;
            }
        }
    }
...
}

Our result now is (see Test-C#-NoLinq):

Test-C#             53.2ms
Test-PHP            11.1ms
Test-C#-NoLinq      6.5ms

So at this point C# is already doing much faster than before, and about twice as fast as the PHP program. This shows that the byte-counting process contributes significantly to the total run time.

So the next question is, can we do even better? When working with byte buffer, iterating through individual bytes is a pretty naive implementation. A more optimized one would be to utilize vectorization techniques such as SIMD. In fact, I would be very surprised if the substr_count function is not using vectorization. In order to test this, I created another PHP test function that iterate through the string instead of using substr_count, which would be comparable to our C# Test_FileStream_NoLinq function:

function test_manual_count()
{
    ...
    while ( ! feof($file)) {
        $buffer = fgets($file, 4096);
        $length = strlen($buffer);
        for ($i = 0; $i < $length; $i++) {
            if($buffer[$i]=='1'){
                $counter += 1;
            }
        }
    }
    ...
}

And the result (see Test-PHP-Manual-Count):

Test-C#-NoLinq          6.5ms
Test-PHP                11.1ms
Test-PHP-Manual-Count   135ms

That is painfully slow, which is why it's always a good idea to use substr_count when you need to count occurrences in a string. Unfortunately, C# doesn't not provide a built-in method with the same functionality, however it does offer a lot of primitives for implementing vectorization. I found an implementation of a SIMD-equivalent function on StackOverflow: VectorExtensions.OccurrencesOf(ReadonlySpan<byte>, byte). With this we can rewrite our counter:

private static void Test_FileStream_Vectorized()
{
...
    while ((numRead = file.Read(buffer, 0, buffer.Length)) != 0)
    {
        counter += buffer.AsSpan().Slice(0, numRead).OccurrencesOf((byte)'1');
    }
...
}

And the result (see Test-C#-Vectorization):

Test-C#-NoLinq          6.5ms
Test-C#-Vectorization   1.0ms

That is 6 times faster than manual loop and about 10x faster than PHP 😊.

Large file

For this test, I'm using an 3.2 GB Ubuntu ISO image. The result looks like this:

Test-PHP                3228.4ms
Test-PHP-Manual-Count   103966.7ms
Test-C#-NoLinq          5175.3ms
Test-C#-Vectorization   1104.7ms

Here we can clearly see how using vectorization makes things a lot faster for both languages.

Top comments (8)

Lito • Mar 18 '22

And memory usage?

goaty92 • Mar 18 '22

Haven't looked into this. From my experiences, the .NET GC tends to be pretty generous with memory allocation, especially with ServerGC, so C# programs usually have a larger memory footprint than say Go or NodeJS. I don't have enough experience with PHP but it would definitely be interesting to look into.

Jacek Andrzejewski • Mar 28 '22

You could use memory_get_peak_usage or memory_get_usage to check how much memory is used in PHP. Just add the first one at the end of script.

Chris Punches • Mar 20 '22

This performance test is not accurate. It does the time capture from within the same process -- the same function even, so it skips over the time spent loading the runtime, running through linked library symbols, time to load the binary, time to execute those symbols -- and pretty much every other piece besides in-runtime translation.

This article should either be retracted or rewritten to use external timer processes (such as time; please see man time for more details).

goaty92 • Mar 20 '22

Agree 🙂. Note that I said in the beginning that I don't find the test to be meaningful. This is done just to response to the same test in the original post.

goaty92 • Mar 18 '22

My main point is not as much to "counter" as to point out how easy it is to misunderstand performance characteristics of programs 🙂 . In fact I believe the PHP code could be further optimized to be much faster as well.

Jacek Andrzejewski • Mar 28 '22

That's true, one thing is while and for loops in PHP are painfully slow compared to foreach. If you use foreach for checking characters you can also forget about strlen too.
One other minor thing you could do is to import all builtin functions you are using or prefix them with namespace (in case of builtins it's "\").

Overall I agree, benchmarks like this make no sense if you don't go extremely in depth to make sure you are really testing the same thing.