DEV Community

Cover image for Morton encoding with AVX512
Wunk
Wunk

Posted on

4 2

Morton encoding with AVX512

In anticipation of Icelake bringing AVX512 to consumer hardware, I've come up with an alternative to the pdep method for encoding a 32-bit 2D coordinate into a 64-bit morton code which utilizes the vpshufbitqmb instruction of the BITALG subset of AVX512.
vpshufbitqmb allows building a new value bit by bit by indexing into a 64-bit lane(into an AVX512 k-register mask).

It relies on interpreting two 32-bit X and Y coordinates as a continuous 64-bit value though. It can support much higher dimensions provided you python-zip(as in, feed in and interleave the leading bytes or so of the coordinates vector elements into each 64-bit lane before selecting bits) certain bytes of the input coordinates.

There is no throughput or latency data at the moment for this instruction or any hardware to actually test this upon but the assembly looks very promising, compared to the currently fastest pdep method. Especially for any bulk-processing where all the constants are loaded outside of the loop.

An illustrative example of the two methods:

#pragma pack(push, 1)
union Coord2D
{
    std::uint32_t u32[2]; // x, y
    std::uint64_t u64;
};
#pragma pack(pop)

std::uint64_t Morton2D_AVX512( const Coord2D& coord )
{
    const __m512i CoordVec = _mm512_set1_epi64(coord.u64);
    // Interleave bits from 32-bit X and Y coordinate
    const __mmask64 Interleave = _mm512_bitshuffle_epi64_mask(
        CoordVec,+
        _mm512_set_epi16(
            0x20'00 + 0x01'01 * 31, 0x20'00 + 0x01'01 * 30,
            0x20'00 + 0x01'01 * 29, 0x20'00 + 0x01'01 * 28,
            0x20'00 + 0x01'01 * 27, 0x20'00 + 0x01'01 * 26,
            0x20'00 + 0x01'01 * 25, 0x20'00 + 0x01'01 * 24,
            0x20'00 + 0x01'01 * 23, 0x20'00 + 0x01'01 * 22,
            0x20'00 + 0x01'01 * 21, 0x20'00 + 0x01'01 * 20,
            0x20'00 + 0x01'01 * 19, 0x20'00 + 0x01'01 * 18,
            0x20'00 + 0x01'01 * 17, 0x20'00 + 0x01'01 * 16,
            0x20'00 + 0x01'01 * 15, 0x20'00 + 0x01'01 * 14,
            0x20'00 + 0x01'01 * 13, 0x20'00 + 0x01'01 * 12,
            0x20'00 + 0x01'01 * 11, 0x20'00 + 0x01'01 * 10,
            0x20'00 + 0x01'01 *  9, 0x20'00 + 0x01'01 *  8,
            0x20'00 + 0x01'01 *  7, 0x20'00 + 0x01'01 *  6,
            0x20'00 + 0x01'01 *  5, 0x20'00 + 0x01'01 *  4,
            0x20'00 + 0x01'01 *  3, 0x20'00 + 0x01'01 *  2,
            0x20'00 + 0x01'01 *  1, 0x20'00 + 0x01'01 *  0
        )
    );
    return _cvtmask64_u64(Interleave);
}

std::uint64_t Morton2D_BMI(const Coord2D& coord)
{
    return _pdep_u64(static_cast<std::uint64_t>(coord.u32[0]), 0x5555555555555555)
        | _pdep_u64(static_cast<std::uint64_t>(coord.u32[1]), 0xAAAAAAAAAAAAAAAA);
}

The generated assembly using gcc 9.1 with -Ofast -march=icelake-client:

Morton2D_AVX512(Coord2D const&):
        vpbroadcastq    zmm0, QWORD PTR [rdi]
        vpshufbitqmb    k0, zmm0, ZMMWORD PTR .LC0[rip]
        kmovq   rax, k0
        vzeroupper
        ret
Morton2D_BMI(Coord2D const&):
        mov     eax, DWORD PTR [rdi]
        movabs  rdx, 6148914691236517205
        pdep    rax, rax, rdx
        mov     edx, DWORD PTR [rdi+4]
        movabs  rcx, -6148914691236517206
        pdep    rdx, rdx, rcx
        or      rax, rdx
        ret

I have verified the implementation with Intel's emulator and I'd love to update with some actual benchmarks once I get some Icelake hardware in my hands.

Heroku

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (0)

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay