C# Pattern Matching Performance Overhead

#csharp #performance

Pattern matching is great, makes life easier, but it comes with a cost. Most of the time the cost is negligible. But it is better to know when not to use it.

Lets assume following simple case,

Pattern Matching

    public abstract class Shape {
    }

    public class Circle: Shape {
        public double Radius;
    }

    public class Rectangle: Shape {
        public double Width;
        public double Height;
    }

We want to calculate area, by using pattern matching, so here is the code.

    public double Pattern(Shape shape) {
        switch(shape) {
            case Circle circle:
                return Math.PI * circle.Radius * circle.Radius;
            case Rectangle rectangle:
                return rectangle.Width * rectangle.Height;
        }
        throw new NotSupportedException();
    }

Now lets analyze JIT ASM generated for above code,

C.Pattern(Shape)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: vzeroupper
    L0008: mov rsi, rdx
    L000b: mov rdx, rsi
    L000e: mov rcx, 0x7ffb9945d250
    L0018: call 0x00007ffbef626620
    L001d: test rax, rax
    L0020: jne short L0049
    L0022: mov rdx, rsi
    L0025: mov rcx, 0x7ffb9945d3c0
    L002f: call 0x00007ffbef626620
    L0034: test rax, rax
    L0037: je short L0064
    L0039: vmovsd xmm0, [rax+8]
    L003e: vmulsd xmm0, xmm0, [rax+0x10]
    L0043: add rsp, 0x20
    L0047: pop rsi
    L0048: ret
    L0049: vmovsd xmm0, [rax+8]
    L004e: vmovaps xmm1, xmm0
    L0052: vmulsd xmm1, xmm1, [C.Pattern(Shape)]
    L005a: vmulsd xmm0, xmm0, xmm1
    L005e: add rsp, 0x20
    L0062: pop rsi
    L0063: ret
    L0064: mov rcx, 0x7ffb8fbddfd8
    L006e: call 0x00007ffbef6278f0
    L0073: mov rsi, rax
    L0076: mov rcx, rsi
    L0079: call System.NotSupportedException..ctor()
    L007e: mov rcx, rsi
    L0081: call 0x00007ffbef5eb3a0
    L0086: int3

Notice lines, L0018: call 0x00007ffbef626620 and L002f: call 0x00007ffbef626620, both calls are to check whether the object is of instance requested. I was also surprised to see this as pattern matching can also match an interface/base class. So it is not simple comparison. Every pattern matching case requires separate CALL instruction, which is a considerable overhead on CPU.

So, in simple words, the last case will always require all instance of CALL before hitting the case. You can certainly move the most used cases on the top to make life easier.

Plain old OOPS

    public abstract class Shape {
        public abstract double Area();
    }

    public class Circle: Shape {
        public double Radius;

        public override double Area()
        {
            return Math.PI * Radius * Radius;
        }
    }

    public class Rectangle: Shape {
        public double Width;
        public double Height;

        public override double Area() {
            return Width * Height;
        }
    }

Now lets call Area method and see what JIT ASM generates.

    public double Area(Shape shape) {
        return shape.Area();
    }

C.Area(Shape)
    L0000: mov rcx, rdx
    L0003: mov rax, [rdx]
    L0006: mov rax, [rax+0x40]
    L000a: mov rax, [rax+0x20]
    L000e: jmp rax

It is simplest single call, it loads and address and jumps to it and returns from the method. The speed is achieved by avoiding instance of checks.

Complete sample

https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIGYACMhgYQYG8aHunHiUGAWQAUASg5ceAX0ndZvBgBMIAV2AAbGAwCCsbMIDKAC2wAHLbhPnxnaj3tMA7A0tmYAOl0x9ogNzyZOx55eiVVDS0ABWwMDBgoADtDKwsUm3l7XAB3AEsMMCNhV2sJIIcHMDwtFhyoME0GMFr6mBAM8odiZ0EYo3dIgEkGACpG5s13ACVsRRyVXBGxuonp2fn/Mo7uStwtSZgwDGwEgHMG2EPjs9b2re4uhguj04mAdRzFDCNFp6uJgAkYDkTkYMBstoEtl8oBAsgwEjA4QA5CAYAwqUymaBxRQAUQQYBgpgwOQgST8AXkIUY2GAuAwUGwhyYpAYxjcpS2oVp9MZzOUagaXh84OkVM2oWYNWWrTZKU5HVCAoiDFWc1wG1uCggADd4lAPlplUK9GItbY7jwHj0vv0hqM1fNFo6NVrIWLNtSWaqDs9riA5RyLYrGMatO9PkZReUleEGoDgaDow4taFdfrDWFBVphWIFZaHhGvosEyCwW7KdQpEA=

Can performance of Pattern matching improved?

Not really, high level languages usually store too much type information in order to provide improved pattern matching. But underlying cost of every single comparison is costly.

In Processor, any type of processor, be it ARM, X64 or any, single numeric comparison is much caster than hierarchical or sequential types. Basically type matching is a multi step comparison.

This is also the reason V8 team decided to store Enum flag to detect type in JavaScript inbuilt type instead of pattern matching.

Branchless

OOPS will also make most of your code branchless, which makes CPU eagerly load next set of instruction to execute instead of missing branch cache and slowing the execution.

More about branchless coding

DEV Community