DEV Community

Cover image for I tricked myself into making a type-safe binary manipulation library
Iwo Plaza
Iwo Plaza

Posted on

I tricked myself into making a type-safe binary manipulation library

Rarely do TypeScript developers have to interact with binary data directly. We can usually depend on abstractions or ready-made solutions, but where is the fun in that? I thought to myself:

"What is the most TypeScript way of interacting with binary?"

With the release of Typed Binary 4, it seems to be the best time to go through how the library has helped me (and hopefully can help you) confidently tackle binary-specific problems in TypeScript.

Binary data in TypeScript

Whether to decode/encode custom file formats, talk to a game server across the network or run code on the GPU, we eventually go down to the level of just binary. Having to interact with binary data in a TypeScript code-base can be a daunting task, even with typed arrays. Add on the complicated alignment rules of the std140 standard, and the thought of debugging whether the GPU correctly interprets our Minecraft clone voxel grid is… not fun. There has to be a better way.

THIS IS FINE... meme but with a frog and visual glitches

A good set of primitives

Lets start off with the basics. We can encode binary primitives into serial streams, and read them back.

import { BufferWriter, u32, f32 } from 'typed-binary';

// We can either staticly enforce a maximum buffer size, or
// use size estimation (shown later)
const buffer = new ArrayBuffer(8);

// Anything that implements the ISerialOutput interface can be
// written to. The built in BufferWriter operates on ArrayBuffers.
const output = new BufferWriter(buffer);

u32.write(output, 500);
f32.write(output, 9.5);
Enter fullscreen mode Exit fullscreen mode

Then when we want to read the data back, we can do the same sequence and replace writes with reads.

const input = new BufferReader(buffer);

const id = u32.read(input); // 500
const health = f32.read(input); // 9.5
Enter fullscreen mode Exit fullscreen mode

If you care about consistent byte order across devices (you usually should), endianness can be set explicitly when creating a BufferWriter/BufferReader. By default it is set to the automatically detected system endianness.

There are many primitives built into the language, but we can extend the usefulness of them with compound schemas.

Representing compound data

Instead of writing and reading a set of primitives to build up bigger structures, we can use compound types like objects, tuples or arrays.

import { object, dynamicArrayOf, string, u32, f32 } from 'typed-binary';

// properties are encoded next to each other in the
// order they are defined in the schema.
const Player = object({
  id: u32,
  name: string, // a null-terminated string
  health: f32,
  position: tupleOf([f32, f32, f32]),
  // length of the array gets encoded into the binary,
  // therefore supporting a variable-length array.
  inventory: dynamicArrayOf(object({
    id: u32,
    quantity: u32,
  })),
});

const player = {
  id: 500,
  name: 'Dave',
  health: 9.5,
  position: [0.1, 12.0, 16.7],
  inventory: [],
} as const;

// Creating a buffer that is sized "just enough" to hold a
// single player.
const buffer = new ArrayBuffer(Player.measure(player).size);

// The function only accepts values that match the schema.
Player.write(new BufferWriter(buffer), player);
Enter fullscreen mode Exit fullscreen mode

Compound schemas are made in a way that allows TypeScript to infer what value the binary actually represents in terms of JavaScript values. We can extract what value is encoded by the schema using the Parsed utility type:

type Player = Parsed<typeof Player>;

expectTypeOf<Player>().toMatchTypeOf<{
  id: number;
  name: string;
  health: number;
  position: [number, number, number];
  inventory: { id: number, quantity: number }[];
}>();
Enter fullscreen mode Exit fullscreen mode

This further reduced the amount of errors I was making when transitioning between typed values and raw binary.

My favorite feature

Recursion

The initial use case for this library required the presence of recursive schemas. Lets see an example of what I mean. How would you define a schema that supports the following value:

const expr = {
  type: 'add',
  left: {
    type: 'literal',
    value: 123,
  },
  left: {
    type: 'multiply',
    left: {
      type: 'literal',
      value: 0.2,
    },
    right: {
      type: 'literal',
      value: 6,
    },
  },
};
Enter fullscreen mode Exit fullscreen mode

Seems easy enough, lets try using generic schemas to do so:

const Expr = generic({
  /* common properties, none in this example */
}, {
  // a 'literal' sub-type
  'literal': {
    value: f32,
  },
  // an 'add' sub-type
  'add': {
    left: Expr,
    right: Expr,
  },
  // a 'multiply' sub-type
  'multiply': {
    left: Expr,
    right: Expr,
  },
});
Enter fullscreen mode Exit fullscreen mode

We would expect this to work by default, but unfortunately, this recursive definition prohibits TypeScript from properly
inferring the type of this schema.

In order to have working type inference, we need to introduce some indirection.

// The `keyed` schema associated the given key ('expr' in this
// case) with the schema that gets returned from the function 
// in its second parameter.
//
// The 'Expr' parameter, which was named just like the
// resulting schema to draw a parallel between them, is
// just a reference to the schema that will be returned.
const Expr = keyed('expr', (Expr) =>
  generic({
    /* common properties, none in this example */
  }, {
    // a 'literal' sub-type
    'literal': {
      value: f32,
    },
    // an 'add' sub-type
    'add': {
      left: Expr,
      right: Expr,
    },
    // a 'multiply' sub-type
    'multiply': {
      left: Expr,
      right: Expr,
    },
  }),
);
Enter fullscreen mode Exit fullscreen mode

The references are resolved when parsing the value of a keyed schema (using the Parsed<T> utility type). If the "how" of this mechanism interests you, let me know in the comments! I will make a follow-up post about the inner-workings of Typed Binary.

Extend to your heart's content

I wanted to make sure that the built-in primitives and schemas support 90% of use-cases out of the box, but what if I needed to encode a JS object as a list of key-value pairs? In the case of WebGPU integration, all struct values have to adhere to specific alignment rules, not tightly packed as it is by default.

Lets see a simple example of implementing a custom schema that encodes an angle in radians with 2 bytes of precision.

/**
 * A schema storing radians with 2 bytes of precision.
 */
class RadiansSchema extends Schema<number> {
  read(input: ISerialInput): number {
    const low = input.readByte();
    const high = input.readByte();

    const discrete = (high << 8) | low;
    return (discrete / 65535) * Math.PI;
  }

  write(output: ISerialOutput, value: number): void {
    // The value will be wrapped to be in range of [0, Math.PI)
    const wrapped = ((value % Math.PI) + Math.PI) % Math.PI;
    // Clipping the value to be ints in range of [0, 65535]
    const discrete = Math.min(Math.floor((wrapped / Math.PI) * 65535), 65535);

    const low = discrete & 0xff;
    const high = (discrete >> 8) & 0xff;

    output.writeByte(low);
    output.writeByte(high);
  }

  measure(
    _: number | MaxValue,
    measurer: IMeasurer = new Measurer(),
  ): IMeasurer {
    // The size of the data serialized by this schema
    // doesn't depend on the actual value. It's always 2 bytes.
    return measurer.add(2);
  }
}

export const radians = new RadiansSchema();
Enter fullscreen mode Exit fullscreen mode

Final remarks

Thank you for reading this far! Hope you found Typed Binary interesting, let me know your first impressions and thoughts about using it in the comments 💚🐸

Top comments (0)