nuskey

Posted on Mar 29

zerompk: The Fastest MessagePack Implementation for Rust and Its Optimization Techniques

#rust #performance

I've released zerompk, a fast MessagePack serializer for Rust!

The most widely used MessagePack serializer in Rust is probably rmp_serde. While this is fast enough, zerompk is designed with even greater speed in mind.

Let's look at the benchmarks.

Since MessagePack is a binary format, it's naturally faster than JSON serializers, but there's a significant difference even when compared to other MessagePack serializers like rmp_serde and msgpacker.

How to Use

Basic instructions can be found in the README file, but I'll give a brief explanation here as well.

use zerompk::{FromMessagePack, ToMessagePack};

// The type to be serialized must implement FromMessagePack and ToMessagePack
#[derive(FromMessagePack, ToMessagePack)]
#[msgpack(array)] // Selectable from array/map, default is array
pub struct Person {
  #[msgpack(key = 0)] // Specify key, it is recommended to make it explicit whenever possible
  pub name: String,

  #[msgpack(key = 1)]
  pub age: u32,

  #[msgpack(ignore)] // Add ignore to fields to ignore
  pub metadata: Metadata,
}

// Content omitted as it is not important
#[derive(Default)] // Default is required to deserialize fields that should be ignored
pub struct Metadata;

fn main() {
    let person = Person {
        name: "Alice",
        age: 18,
    };

    // Serialization
    let mut buf = vec![0; 256];
    let msgpack: Vec<u8> = zerompk::to_msgpack(&person, &mut buf)
        .unwrap();

    // Deserialization
    let person: Person = zerompk::from_msgpack(&msgpack)
        .unwrap();
}

Implement FromMessagePack/ToMessagePack and use zerompk::from_msgpack/zerompk::to_msgpack(). These APIs directly read and write the received [u8] slice, and using them is the basic and fastest method.

Streaming via std::io::{Read, Write} is also supported, and read_msgpack()/write_msgpack() are available.

fn main() {
    let file = std::fs::File::open("example.msgpack")
        .unwrap();

    // Continuous read calls to a file are slow, so always use BufReader.
    // Zerompk does not perform buffering.
    // This is also true for rmp_serde and serde_json.
    let mut buf_reader = std::io::BufReader::new(file);

    // Deserialization using Read
    let person: Person = zerompk::read_msgpack(&mut buf_reader)
        .unwrap();
}

However, and this may not be intuitive, reading via std::io::Read is still slow even with buffering. Depending on the size, if it's not too large, it's faster to load it into memory first and then read it. (Incidentally, this is also clearly stated in the serde_json documentation)

fn main() {
    let mut file = std::fs::File::open("example.msgpack")
        .unwrap();

    // Write the file contents to buf
    let mut buf = Vec::new();
    file.read_to_end(&mut buf);

    let person: Person = zerompk::from_msgpack(&buf)
        .unwrap();
}

This could potentially be optimized by implementing a dedicated implementation for BufRead and Seek, but this is not currently implemented in zerompk. However, the effect seems quite significant, so we'd like to implement it soon.

No Serde

As seen in the previous sample code, zerompk does not use the Serde implementation; instead, it performs serialization and deserialization using its own unique traits.

#[derive(FromMessagePack, ToMessagePack)] // This part
pub struct Point {
    pub x: i32,
    pub y: i32,
}

Since the vast majority of Rust serializers are built on top of Serde, this might seem inconvenient. However, when seeking the absolute best performance, moving away from Serde is a highly rational decision.

The reason is that Serde’s generated code is more complex than one might imagine, which has a non-negligible impact not only on compilation times but also on runtime performance. Let’s compare the code generated by Serde versus zerompk for the Point struct above.

Serde

#[doc(hidden)]
#[allow(
    non_upper_case_globals,
    unused_attributes,
    unused_qualifications,
    clippy::absolute_paths,
)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl _serde::Serialize for Point {
        fn serialize<__S>(
            &self,
            __serializer: __S,
        ) -> _serde::__private228::Result<__S::Ok, __S::Error>
        where
            __S: _serde::Serializer,
        {
            let mut __serde_state = _serde::Serializer::serialize_struct(
                __serializer,
                "Point",
                false as usize + 1 + 1,
            )?;
            _serde::ser::SerializeStruct::serialize_field(
                &mut __serde_state,
                "x",
                &self.x,
            )?;
            _serde::ser::SerializeStruct::serialize_field(
                &mut __serde_state,
                "y",
                &self.y,
            )?;
            _serde::ser::SerializeStruct::end(__serde_state)
        }
    }
};
#[doc(hidden)]
#[allow(
    non_upper_case_globals,
    unused_attributes,
    unused_qualifications,
    clippy::absolute_paths,
)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl<'de> _serde::Deserialize<'de> for Point {
        fn deserialize<__D>(
            __deserializer: __D,
        ) -> _serde::__private228::Result<Self, __D::Error>
        where
            __D: _serde::Deserializer<'de>,
        {
            #[allow(non_camel_case_types)]
            #[doc(hidden)]
            enum __Field {
                __field0,
                __field1,
                __ignore,
            }
            #[doc(hidden)]
            struct __FieldVisitor;
            #[automatically_derived]
            impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
                type Value = __Field;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private228::Formatter,
                ) -> _serde::__private228::fmt::Result {
                    _serde::__private228::Formatter::write_str(
                        __formatter,
                        "field identifier",
                    )
                }
                fn visit_u64<__E>(
                    self,
                    __value: u64,
                ) -> _serde::__private228::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        0u64 => _serde::__private228::Ok(__Field::__field0),
                        1u64 => _serde::__private228::Ok(__Field::__field1),
                        _ => _serde::__private228::Ok(__Field::__ignore),
                    }
                }
                fn visit_str<__E>(
                    self,
                    __value: &str,
                ) -> _serde::__private228::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        "x" => _serde::__private228::Ok(__Field::__field0),
                        "y" => _serde::__private228::Ok(__Field::__field1),
                        _ => _serde::__private228::Ok(__Field::__ignore),
                    }
                }
                fn visit_bytes<__E>(
                    self,
                    __value: &[u8],
                ) -> _serde::__private228::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        b"x" => _serde::__private228::Ok(__Field::__field0),
                        b"y" => _serde::__private228::Ok(__Field::__field1),
                        _ => _serde::__private228::Ok(__Field::__ignore),
                    }
                }
            }
            #[automatically_derived]
            impl<'de> _serde::Deserialize<'de> for __Field {
                #[inline]
                fn deserialize<__D>(
                    __deserializer: __D,
                ) -> _serde::__private228::Result<Self, __D::Error>
                where
                    __D: _serde::Deserializer<'de>,
                {
                    _serde::Deserializer::deserialize_identifier(
                        __deserializer,
                        __FieldVisitor,
                    )
                }
            }
            #[doc(hidden)]
            struct __Visitor<'de> {
                marker: _serde::__private228::PhantomData<Point>,
                lifetime: _serde::__private228::PhantomData<&'de ()>,
            }
            #[automatically_derived]
            impl<'de> _serde::de::Visitor<'de> for __Visitor<'de> {
                type Value = Point;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private228::Formatter,
                ) -> _serde::__private228::fmt::Result {
                    _serde::__private228::Formatter::write_str(
                        __formatter,
                        "struct Point",
                    )
                }
                #[inline]
                fn visit_seq<__A>(
                    self,
                    mut __seq: __A,
                ) -> _serde::__private228::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::SeqAccess<'de>,
                {
                    let __field0 = match _serde::de::SeqAccess::next_element::<
                        i32,
                    >(&mut __seq)? {
                        _serde::__private228::Some(__value) => __value,
                        _serde::__private228::None => {
                            return _serde::__private228::Err(
                                _serde::de::Error::invalid_length(
                                    0usize,
                                    &"struct Point with 2 elements",
                                ),
                            );
                        }
                    };
                    let __field1 = match _serde::de::SeqAccess::next_element::<
                        i32,
                    >(&mut __seq)? {
                        _serde::__private228::Some(__value) => __value,
                        _serde::__private228::None => {
                            return _serde::__private228::Err(
                                _serde::de::Error::invalid_length(
                                    1usize,
                                    &"struct Point with 2 elements",
                                ),
                            );
                        }
                    };
                    _serde::__private228::Ok(Point { x: __field0, y: __field1 })
                }
                #[inline]
                fn visit_map<__A>(
                    self,
                    mut __map: __A,
                ) -> _serde::__private228::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::MapAccess<'de>,
                {
                    let mut __field0: _serde::__private228::Option<i32> = _serde::__private228::None;
                    let mut __field1: _serde::__private228::Option<i32> = _serde::__private228::None;
                    while let _serde::__private228::Some(__key) = _serde::de::MapAccess::next_key::<
                        __Field,
                    >(&mut __map)? {
                        match __key {
                            __Field::__field0 => {
                                if _serde::__private228::Option::is_some(&__field0) {
                                    return _serde::__private228::Err(
                                        <__A::Error as _serde::de::Error>::duplicate_field("x"),
                                    );
                                }
                                __field0 = _serde::__private228::Some(
                                    _serde::de::MapAccess::next_value::<i32>(&mut __map)?,
                                );
                            }
                            __Field::__field1 => {
                                if _serde::__private228::Option::is_some(&__field1) {
                                    return _serde::__private228::Err(
                                        <__A::Error as _serde::de::Error>::duplicate_field("y"),
                                    );
                                }
                                __field1 = _serde::__private228::Some(
                                    _serde::de::MapAccess::next_value::<i32>(&mut __map)?,
                                );
                            }
                            _ => {
                                let _ = _serde::de::MapAccess::next_value::<
                                    _serde::de::IgnoredAny,
                                >(&mut __map)?;
                            }
                        }
                    }
                    let __field0 = match __field0 {
                        _serde::__private228::Some(__field0) => __field0,
                        _serde::__private228::None => {
                            _serde::__private228::de::missing_field("x")?
                        }
                    };
                    let __field1 = match __field1 {
                        _serde::__private228::Some(__field1) => __field1,
                        _serde::__private228::None => {
                            _serde::__private228::de::missing_field("y")?
                        }
                    };
                    _serde::__private228::Ok(Point { x: __field0, y: __field1 })
                }
            }
            #[doc(hidden)]
            const FIELDS: &'static [&'static str] = &["x", "y"];
            _serde::Deserializer::deserialize_struct(
                __deserializer,
                "Point",
                FIELDS,
                __Visitor {
                    marker: _serde::__private228::PhantomData::<Point>,
                    lifetime: _serde::__private228::PhantomData,
                },
            )
        }
    }
};

zerompk

impl ::zerompk::ToMessagePack for Point {
    fn write<W: ::zerompk::Write>(
        &self,
        writer: &mut W,
    ) -> ::core::result::Result<(), ::zerompk::Error> {
        writer.write_array_len(2usize)?;
        self.x.write(writer)?;
        self.y.write(writer)?;
        Ok(())
    }
}

impl<'__msgpack_de> ::zerompk::FromMessagePack<'__msgpack_de> for Point {
    fn read<R: ::zerompk::Read<'__msgpack_de>>(
        reader: &mut R,
    ) -> ::core::result::Result<Self, ::zerompk::Error>
    where
        Self: Sized,
    {
        reader.increment_depth()?;
        let __result = {
            reader.check_array_len(2usize)?;
            Ok(Self {
                x: <i32 as ::zerompk::FromMessagePack<'__msgpack_de>>::read(reader)?,
                y: <i32 as ::zerompk::FromMessagePack<'__msgpack_de>>::read(reader)?,
            })
        };
        reader.decrement_depth();
        __result
    }
}

Compared to Serde, which generates a complex visitor, the code generated by zerompk is extremely straightforward. It is clear at a glance that zerompk is positioned to be faster.

Additionally, as a side benefit, the smaller amount of generated code contributes to reduced compilation times and smaller binary sizes.

Zero Copy

Copying is often a bottleneck when writing fast code in Rust. To achieve high performance, it's crucial to process data using zero copy whenever possible.

However, unlike serializers that achieve complete zero copy, such as rkyv, completely eliminating copying with MessagePack is difficult. This is because MessagePack uses variable-length encoding and big-endian, making direct mapping to Rust structs challenging.

As a compromise, we implemented partial zero copy, similar to Serde. By receiving string and binary types using &str/&[u8], we can directly borrow slices of the original data.

#[derive(ToMessagePack, FromMessagePack)]
pub struct NoCopy<'a> {
    pub str: &'a str,
    pub bin: &'a [u8],
}

fn main() -> Result<()> {
    let value = NoCopy {
        str: "hello",
        bin: &[0x01, 0x02, 0x03],
    };
    let msgpack = zerompk::to_msgpack_vec(&value)?;
    let value: NoCopy = zerompk::from_msgpack(&data)?;
}

Optimizing Map Deserialization

MessagePack has two ways to represent objects: Array and Map.

For example, consider the following structure:

#[derive(ToMessagePack, FromMessagePack)]
struct Point {
    x: i32,
    y: i32,
}

fn main() {
    // We want to serialize this.
    let p = Point {
        x: 1,
        y: 2,
    };
}

Representing this as an array:

+--------+------+------+
| FixArr | x    | y    |
| 0x92   | 0x01 | 0x02 |
+--------+------+------+

Representing this as a map:

+--------+-----+-------+-----+-------+
| FixMap | key | value | key | value |
| 0x82   | "x" | 0x01  | "y" | 0x02  |
+--------+-----+-------+-----+-------+

Naturally, arrays have advantages in terms of binary size and performance, but maps win in terms of readability and versioning tolerance.

Serialization can be done straightforwardly in either format, and deserializing an Array is simply a matter of reading it sequentially. The problem lies in deserializing a map. Since the order of keys in the binary is undefined, simply reading them sequentially isn't enough; a cost is incurred in having to look up the keys each time.

Implementing this would look like this.

impl<'a> FromMessagePack<'a> for Point {
    fn read<R: Read<'a>>(&self, reader: &mut R) -> Result<Point> {
        let mut x: Option<i32> = None;
        let mut y: Option<i32> = None;

        let len = reader.read_map_len()?;
        for _ in 0..len {
            let key = reader.read_string()?;
            match &key {
                "x" => {
                    x = Some(reader.read_i32()?);
                },
                "y" => {
                    y = Some(reader.read_i32()?);
                },
                unknown => {
                    return Err(Error::UnknownKey(unknown.to_owned()));
                }
            };
        }

        Ok(Point {
            x: x.ok_or(Error::KeyNotFound("x".into()))?,
            y: y.ok_or(Error::KeyNotFound("y".into()))?,
        })
    }
}

Unlike arrays, a match operation is required each time a key is looked up. This is an unavoidable cost due to the way it is serialized as a map.

Bypassing UTF-8 Verification

While this already provides sufficient speed, let's make it even faster. First, let's focus on the read_string() part.

let key = reader.read_string()?;

The return value of zerompk::Read::read_string() is Cow<'a, str>, not String. Therefore, if the source is a slice (not std::io::Read), no String allocation occurs here.

However, there's still room for improvement. Constructing &str from &[u8] incurs the cost of verifying that the byte sequence is correct UTF-8. However, in this case, we're simply comparing the byte sequence to a key that is UTF-8 (an error occurs if they don't match), so there's no need to perform verification here.

Therefore, we've prepared zerompk::Read::read_string_bytes(), which accepts the string directly as a u8 byte sequence without converting it to str.

let key_bytes = reader.read_string_bytes()?;
match &key_bytes {
    b"x" => ...
    b"y" => ...
    _ => return Err(...)
};

This bypasses the unnecessary UTF-8 check.

Automata-based String Search

For this size and the brevity of the field names, this should be sufficient. However, if the number of fields increases to 10 or 20, the number of cases within match will balloon.

let key_bytes = reader.read_string_bytes()?;

match &key_bytes {
    b"foo_bar_baz1" => ...
    b"foo_bar_baz2" => ...
    ...
    _ => return Err(...)
};

Rust's implementation of match for strings is surprisingly straightforward. Compiling this to LLVM IR results in:

A switch statement based on string length.
For each case, it uses if-else equivalent code to call bcomp for comparison.

This is how matching is performed. While bcomp itself is fast and usually sufficient, it's not as fast as expected when dealing with a large number of cases or long strings.

Another method for this type of string matching involves using a hash table with a perfect hash function (PHF), but since the number of fields in a struct is generally not that large, the cost of hash calculation outweighs the benefits.

Therefore, this time we'll adopt an automaton-based search. This is an optimization technique also used in MessagePack for C#, which performs key lookups using an 8-byte trie.

Citation: https://neue.cc/2017/08/28_558.html

As shown in the cited diagram, the UTF8 binary of the string is converted to u64 in 8-byte units, and then matched using several integer comparisons. The actual generation code looks something like this.

#[msgpack(map)]
pub struct TestStruct {
    pub field_0: i32,
    pub field_1: String,
    pub field_2: bool,
    pub field_3: Vec<Point>,
}
impl ::zerompk::ToMessagePack for TestStruct {
    fn write<W: ::zerompk::Write>(
        &self,
        writer: &mut W,
    ) -> ::core::result::Result<(), ::zerompk::Error> {
        writer.write_map_len(4usize)?;
        writer.write_string("field_0")?;
        self.field_0.write(writer)?;
        writer.write_string("field_1")?;
        self.field_1.write(writer)?;
        writer.write_string("field_2")?;
        self.field_2.write(writer)?;
        writer.write_string("field_3")?;
        self.field_3.write(writer)?;
        Ok(())
    }
}
impl<'__msgpack_de> ::zerompk::FromMessagePack<'__msgpack_de> for TestStruct {
    fn read<R: ::zerompk::Read<'__msgpack_de>>(
        reader: &mut R,
    ) -> ::core::result::Result<Self, ::zerompk::Error>
    where
        Self: Sized,
    {
        reader.increment_depth()?;
        let __result = {
            reader.check_map_len(4usize)?;
            let mut __slot_field_0: ::core::option::Option<i32> = ::core::option::Option::None;
            let mut __slot_field_1: ::core::option::Option<String> = ::core::option::Option::None;
            let mut __slot_field_2: ::core::option::Option<bool> = ::core::option::Option::None;
            let mut __slot_field_3: ::core::option::Option<Vec<Point>> = ::core::option::Option::None;
            for _ in 0..4usize {
                let __key_bytes = reader.read_string_bytes()?;
                let __key_bytes = __key_bytes.as_ref();
                let __key_index = (|| -> ::zerompk::Result<usize> {
                    let __matched_idx: usize = match __key_bytes.len() {
                        7usize => {
                            let __key_chunk_0: u64 = ((u32::from_le_bytes(unsafe {
                                *(__key_bytes.as_ptr().add(0usize) as *const [u8; 4])
                            }) as u64)
                                | ((u16::from_le_bytes(unsafe {
                                    *(__key_bytes.as_ptr().add(4usize) as *const [u8; 2])
                                }) as u64) << 32) | ((__key_bytes[6usize] as u64) << 48));
                            match __key_chunk_0 {
                                13615683802065254u64 => 0usize,
                                13897158778775910u64 => 1usize,
                                14178633755486566u64 => 2usize,
                                14460108732197222u64 => 3usize,
                                _ => usize::MAX,
                            }
                        }
                        _ => usize::MAX,
                    };
                    if __matched_idx != usize::MAX {
                        Ok(__matched_idx)
                    } else {
                        {
                            let __unknown_key = match ::core::str::from_utf8(
                                __key_bytes,
                            ) {
                                Ok(s) => s.into(),
                                Err(_) => "<invalid-utf8>".into(),
                            };
                            Err(::zerompk::Error::UnknownKey(__unknown_key))
                        }
                    }
                })()?;
                match __key_index {
                    0usize => {
                        if __slot_field_0.is_some() {
                            return Err(
                                ::zerompk::Error::KeyDuplicated("field_0".into()),
                            );
                        }
                        __slot_field_0 = ::core::option::Option::Some(
                            <i32 as ::zerompk::FromMessagePack<
                                '__msgpack_de,
                            >>::read(reader)?,
                        );
                    }
                    1usize => {
                        if __slot_field_1.is_some() {
                            return Err(
                                ::zerompk::Error::KeyDuplicated("field_1".into()),
                            );
                        }
                        __slot_field_1 = ::core::option::Option::Some(
                            <String as ::zerompk::FromMessagePack<
                                '__msgpack_de,
                            >>::read(reader)?,
                        );
                    }
                    2usize => {
                        if __slot_field_2.is_some() {
                            return Err(
                                ::zerompk::Error::KeyDuplicated("field_2".into()),
                            );
                        }
                        __slot_field_2 = ::core::option::Option::Some(
                            <bool as ::zerompk::FromMessagePack<
                                '__msgpack_de,
                            >>::read(reader)?,
                        );
                    }
                    3usize => {
                        if __slot_field_3.is_some() {
                            return Err(
                                ::zerompk::Error::KeyDuplicated("field_3".into()),
                            );
                        }
                        __slot_field_3 = ::core::option::Option::Some(
                            <Vec<
                                Point,
                            > as ::zerompk::FromMessagePack<
                                '__msgpack_de,
                            >>::read(reader)?,
                        );
                    }
                    _ => {
                        ::core::panicking::panic(
                            "internal error: entered unreachable code",
                        )
                    }
                }
            }
            let field_0 = __slot_field_0
                .ok_or_else(|| ::zerompk::Error::KeyNotFound("field_0".into()))?;
            let field_1 = __slot_field_1
                .ok_or_else(|| ::zerompk::Error::KeyNotFound("field_1".into()))?;
            let field_2 = __slot_field_2
                .ok_or_else(|| ::zerompk::Error::KeyNotFound("field_2".into()))?;
            let field_3 = __slot_field_3
                .ok_or_else(|| ::zerompk::Error::KeyNotFound("field_3".into()))?;
            Ok(Self {
                field_0: field_0,
                field_1: field_1,
                field_2: field_2,
                field_3: field_3,
            })
        };
        reader.decrement_depth();
        __result
    }
}

The automaton is constructed at compile time using the derive macro. This makes it possible to deserialize Maps faster than with match.

Security

One thing to consider when creating a serializer is vulnerabilities. Since incoming binaries are not always 100% secure, some kind of countermeasure is necessary on the serializer side during deserialization.

That being said, zerompk always requires strict type checking, so vulnerabilities like the deserialization of strange types common in Java, .NET, and JavaScript cannot occur.

Other attacks include embedding a huge array header in a small binary, or exhausting the call stack by including excessively nested objects. Regarding this, we've implemented measures such as first verifying if the buffer size is sufficient (simply using Vec::with_capacity() without checking is a bad idea!), and returning an error if the depth exceeds a specified number.

Of course, the serializer cannot handle cases where the data itself is invalid, so authentication should be handled on the application side.

Summary

Many parts of the optimization and security features were implemented referencing MessagePack for C#. The MessagePack for C# implementation is incredibly well done. However, optimizing code like binary decoding is definitely easier in Rust than in C#. Having a strong type system makes writing with generics incredibly easy and enjoyable.

So, it's turned out really well, and I highly recommend you try it out!

DEV Community