loading...
Microsoft Azure

Passing structured data between C++ and JavaScript in Web Assembly

nebrius profile image Bryan Hughes ・7 min read

I recently finished getting the messaging stack of my wireless LED control system running in Node.js via Web Assembly (WASM for short). The last major hurdle I encountered was how to pass a lot of structured data between JavaScript and C++.

The Scenario

The core data that is passed around through my messaging stack is a set of control parameters needed to control LED animations. This information is defined in C++ using the following structs:

#define NUM_WAVES 4

struct RVLWaveChannel {
  uint8_t a = 0;
  uint8_t b = 0;
  int8_t w_t = 0;
  int8_t w_x = 0;
  int8_t phi = 0;
};

struct RVLWave {
  RVLWaveChannel h;
  RVLWaveChannel s;
  RVLWaveChannel v;
  RVLWaveChannel a;
};

struct RVLWaveSettings {
  uint8_t timePeriod = 255;
  uint8_t distancePeriod = 32;
  RVLWave waves[NUM_WAVES];
};

My animation algorithm uses a bunch of coefficients to calculate a series of superimposed sin waves. The details of how this works are for another day though, just know that it looks really really pretty! What you do need to know for this blog post is that there are 82 (!!) numerical values that need to be passed from JavaScript to C++, and vice versa.

As I mentioned in the previous post in this series, you can only pass numbers between C++ and JavaScript in WASM. This structured data is numerical, but it's also structured. We must preserve this structure as well as the numerical values when passing data around.

We could implement a function that takes 82 arguments...but I mean, c'mon, really? I just know I'd mess it up! This approach would also make it really hard to update if the data changed. So we need something else. I thought about serializing to a JSON string and then deserializing it, but that takes a lot of work, processing time, and code bloat on the C++ side.

What I needed was something clever...

My Solution

And clever was what I found! I remembered that structs in C/C++ are laid out in memory in a deterministic manner. I realized that with this knowledge, I could directly marshal and unmarshal the data from the memory array in JavaScript, just like I did with strings!

To illustrate what I'm talking about, let's take a very simple struct:

struct MyStruct {
  uint8_t a = 0x12;
  uint16_t b = 0x3456;
  uint32_t c = 0x789ABCDE;
};

MyStruct str;

If we inspect the memory that str points to (e.g. the numerical value of &str in C/C++ parlance), we will see the following:

str + 0 str + 1 str + 2 str + 3 str + 4 str + 5 str + 6
0x12 0x34 0x56 0x78 0x9A 0xBC 0xDE

By using the sizeof() operator in C++, we know that this struct is 7 bytes large, which matches the layout above. We can also see that the values are stacked right next to each other in memory! All we need to know is the "memory offset" of each value relative to the base pointer, i.e. the + n part in the table.

So how do we determine this offset? C/C++ always arranges these properties in memory in the order they are declared in the struct in the source code. In this example, a comes first, followed by b, followed by c, because I declared them in that order in the code. If we switched the order of b and c so that b was at the end of the source code, then b would also be at the end of the memory block.

This means that we can calculate each offset summing up the size of every entry that came before it.

Automating the calculation of offsets

Calculating these by hand is error prone though, especially when structs reference other structs as I do. I would also have to recalculate these offsets if I ever changed the data in the structs. This is a perfect opportunity to automate the process with a build-time script!

You can see the (admittedly poorly commented) Node.js script I wrote on GitHub.

The first thing I did was write a quick-n-dirty C++ parser using regex's. This parse produces a data structure that looks like this:

const structs = {
  RVLWaveChannel: [
    { name: 'a', type: 'uint8_t', initialValue: 0 },
    { name: 'b', type: 'uint8_t', initialValue: 0 },
    { name: 'w_t', type: 'int8_t', initialValue: 0 },
    { name: 'w_x', type: 'int8_t', initialValue: 0 },
    { name: 'phi', type: 'int8_t', initialValue: 0 }
  ],
  RVLWave: [
    { name: 'h', type: 'RVLWaveChannel', initialValue: undefined },
    { name: 's', type: 'RVLWaveChannel', initialValue: undefined },
    { name: 'v', type: 'RVLWaveChannel', initialValue: undefined },
    { name: 'a', type: 'RVLWaveChannel', initialValue: undefined }
  ],
  RVLWaveSettings: [
    { name: 'timePeriod', type: 'uint8_t', initialValue: 255 },
    { name: 'distancePeriod', type: 'uint8_t', initialValue: 32 },
    { name: 'waves', type: 'array', subType: 'RVLWave', arraySize: 4 }
  ]
};

Now we have a representation of the C++ structs in JavaScript. We're not quite ready to start calculating offsets just yet though. We have references in two of our structs to other structs, and we also have an array. When this struct is instantiated in C++, these different structs and arrays aren't represented as pointers to multiple memory blocks. Rather, the structs and arrays are "flattened" such that they all sit in a single 82 byte memory block.

To represent this flattening in memory accurately, we must flatted our own representation of these structs and arrays too. I accomplished this by writing a while loop that iterates through each entry in the "root" struct (RVLWaveSettings in this case). We then replace any entry whose type value is not a primitive from stdint.h (e.g. something of the form [u?]int[8|16|32]_t) with it's "referenced" type. The way we do this replacement depends on whether it's a struct or an array. The while loop keeps running until there are no more replacements to be made.

When the loop encounters an array of items, it "unrolls" the array. In other words, it replaces:

{ name: 'waves', type: 'array', subType: 'RVLWave', arraySize: 4 }

with:

{ name: 'waves[0]', type: 'RVLWave', initialValue: undefined }
{ name: 'waves[1]', type: 'RVLWave', initialValue: undefined }
{ name: 'waves[2]', type: 'RVLWave', initialValue: undefined }
{ name: 'waves[3]', type: 'RVLWave', initialValue: undefined }

When each loop of the iteration encounters a struct type, it replaces the reference to the struct with the entire entry of the struct. In other words, it replaces:

{ name: 'waves[0]', type: 'RVLWave', initialValue: undefined }

with:

{ name: 'waves[0].h', type: 'RVLWaveChannel', initialValue: undefined }
{ name: 'waves[0].s', type: 'RVLWaveChannel', initialValue: undefined }
{ name: 'waves[0].v', type: 'RVLWaveChannel', initialValue: undefined }
{ name: 'waves[0].a', type: 'RVLWaveChannel', initialValue: undefined }

If we keep running this algorithm, we eventually end up with a set of entries that look like this:

{ name: "timePeriod", type: "uint8_t", initialValue: 255, size: 1 }
{ name: "distancePeriod", type: "uint8_t", initialValue: 32, size: 1 }
{ name: "waves[0].h.a", type: "uint8_t", initialValue: 0, size: 1 }
{ name: "waves[0].h.b", type: "uint8_t", initialValue: 0, size: 1 }
{ name: "waves[0].h.w_t", type: "int8_t", initialValue: 0, size: 1 }
{ name: "waves[0].h.w_x", type: "int8_t", initialValue: 0, size: 1 }
{ name: "waves[0].h.phi", type: "int8_t", initialValue: 0, size: 1 }
{ name: "waves[0].s.a", type: "uint8_t", initialValue: 0, size: 1 }
...

With this, we can now loop through and calculate the offsets! I iterate through each entry and keep a running sum of the sizes, which is the memory offset for each entry. I then write this information to a JSON file that looks like this:

{
  "totalSize": 82,
  "entryDictionary": {
    "timePeriod": {
      "name": "timePeriod",
      "type": "uint8_t",
      "initialValue": 255,
      "size": 1,
      "index": 0
    },
    "distancePeriod": {
      "name": "distancePeriod",
      "type": "uint8_t",
      "initialValue": 32,
      "size": 1,
      "index": 1
    },
    "waves[0].h.a": {
      "name": "waves[0].h.a",
      "type": "uint8_t",
      "initialValue": 0,
      "size": 1,
      "index": 2
    },
    ...
  }
}

Using offsets to read from a C++ struct in JavaScript

Now that we have our offsets, we can finally start passing data back and forth! Let's start by talking about how we read data from C++ into JavaScript. We start the same as we did with strings: by creating a Node.js Buffer object that represents the area of memory containing the struct we want to read. Then we iterate through each element in the offset data and read the value at the given offset:

const view = Buffer.from(memory.buffer, waveSettingsPointer, structData.totalSize);
for (const entryName in structData.entryDictionary) {
  const structEntry = structData.entryDictionary[entryName];
  let value = 0;
  switch (structEntry.type) {
    case 'uint8_t':
      value = view.readUInt8(structEntry.index);
      break;
    case 'int8_t':
      value = view.readInt8(structEntry.index);
      break;
    case 'uint16_t':
      value = view.readUInt16BE(structEntry.index);
      break;
    case 'int16_t':
      value = view.readInt16BE(structEntry.index);
      break;
    case 'uint32_t':
      value = view.readUInt32BE(structEntry.index);
      break;
    case 'int32_t':
      value = view.readInt32BE(structEntry.index);
      break;
    default:
      throw new Error(`Unexpected struct type "${structEntry.type}"`);
  }
  // Assign the value we just read to a JavaScript mirror object
  // using some dense code I'd rather not show here :-P
}

We then end up with a data structure in JavaScript defined using the following TypeScript interfaces:

export interface IWaveChannel {
  a: number; // Default 0
  b: number; // Default 0
  w_t: number; // Default 0
  w_x: number; // Default 0
  phi: number; // Default 0
}

export interface IWave {
  h: IWaveChannel;
  s: IWaveChannel;
  v: IWaveChannel;
  a: IWaveChannel;
}

export interface IWaveParameters {
  timePeriod?: number; // Default 255
  distancePeriod?: number; // Default 32
  waves: IWave[];
}

Looks familiar, right?

Writing to a C++ struct from JavaScript is effectively the reverse of the above. To see all the code that does the marshalling and unmarshalling, check out bridge.ts on GitHub.

And that's that, we can now pass structs from C++ into JavaScript and vice versa! It may sound like a whole lot of work for something that you might think would be simple, but that's kinda turning out to be par for the course with WASM. Regardless, this mechanism marks the next big step towards integrating this system with Azure IoT Edge via Node.js!

If I have time in the future, I'd love to beef up my parsing script to use a proper C++ AST parser so it can work with a wider range of code, and publish all of this as an easy-to-consume module on npm.

You can check out the complete code for the WASM powered Node.js messaging library in the RVL-Node repository on GitHub.

Posted on by:

nebrius profile

Bryan Hughes

@nebrius

Bryan Hughes is a long-time member of the Node.js and NodeBots communities, and tech activist.

Microsoft Azure

Any language. Any platform.

Discussion

markdown guide