DEV Community

Igor Proskurin
Igor Proskurin

Posted on

Playing with low-level memory allocation in WebAssembly

I wrote a blog post about my first experience with WebAssembly (WASM) recently. And in that post, I touched a little bit how to set up an SDK for writing code in C/C++, and how to compile a simple C++ function that can takes couple of numeric values from JavaScript, run WASM binary in a browser, and return a value.

For those who just jumped in, WebAssembly is cool cross-platform binary format, assembly language, and a Virtual Machine to run this binary in a browser. What can it do? Well, it can mine crypto currency silently in background while you go through your favorite webpages. And guess who pays for the electricity?

Well, besides crypto-currency abuse, it is an interesting technology to run heavy stuff client-side with reasonable performance.

Where to start

In this post, I am playing around with Emscipten. It is a WASM compiler which wraps around clang to compile C/C++ source code in a binary .wasm format. It also provides some glue-code API to embed this WASM binary into JavaScript. Just look into MDN Docs and Emscripten SDK to get started.

Managing memory with Emscripten

Before, diving into high-level Emscripten stuff such as Embind, I decided to look into its low-level memory model.

Here is a toy problem. We have a C-function that takes an array of double precision values, do something with them and return a number. It may look as simple this.

// malloc_testing.c

#include <assert.h>
#include <math.h>
#include <stdio.h>

double vector_norm(double* p, int n) {
    int i;
    double norm2 = 0;

    assert(n > 0 && "number of elements must be positive");
    assert(p && "must be a valid pointer");
    printf("received: n = %d\n", n);

    for (i = 0; i < n; i++) {
        printf("processed: p[%d] = %.3f\n", i, p[i]);
        norm2 += p[i] * p[i];
    }
    return sqrt(norm2);
}
Enter fullscreen mode Exit fullscreen mode

In the code, I sprinkled some asserts and old-fashioned print-outs for convenience. Don't forget to wrap it in extern "C" {} if you are going to treat as C++ code.

We already know that this function can be called from JavaScript using ccall() or cwrap() methods, but how can we pass an array from JavaScript to our C-code?

Let us compile this function into a binary using Emscripten emcc compiler.

$emcc malloc_testing.c -o malloc_testing.js -O0 -sASSERTIONS=2 -sEXPORTED_FUNCTIONS=_vector_norm,_malloc,_free,setValue -sEXPORTED_RUNTIME_METHODS=cwrap,ccall
Enter fullscreen mode Exit fullscreen mode

Here, I tell the compiler to keep the assertions by setting low optimization level -O0, and export some useful stuff like _malloc, _free, and setValue, and our C-function of course _vector_norm (note the leading underscore).

Now we have a couple of files: malloc_testing.wasm that contains a binary, and malloc_testing.js which is JavaScript glue code that allows us to use it from a web page. You can also run in Node.js, but in this case it should be compiled with -sMODULARIZE.

Allocating memory from JavaScript

How does memory model of the WASM VM look like? Well, for C/C++ code it look pretty normal: code, heap, stack. We can allocate stuff on the heap and pass pointers around. Luckily for us, we also asked emcc to export memory allocation, _malloc, in to the JavaScript glue code, so now we can allocate memory on the heap in WASM from JavaScript.

In theory, the whole process looks easy: allocate memory on the heap and get pointer into JavaScript code, write something into this memory, and pass this pointer to the C-function. Something like that:

Image description

Let's try it. I will use a simple web page set up to run our C-function inside the browser by pressing a button.

<!DOCTYPE html>
<html lang="en">
<body>

<button id="mybutton">Run</button>

<script>
    document.getElementById("mybutton").addEventListener("click", ()=>{

        const vectorNorm = Module.cwrap(
                               'vector_norm', // no underscore
                                'number',  // return type 
                                ['number', 'number']); // param types;

        const myTypedArray = new Float64Array([0, 1, 2, 3, 5]);

        // allocate empty buffer
        let buf = Module._malloc(myTypedArray.length * myTypedArray.BYTES_PER_ELEMENT);

        // fill  this buffer with our stuff
        Module.HEAPF64.set(myTypedArray, buf / myTypedArray.BYTES_PER_ELEMENT);

        // call our function and pass pointer to buffer
        const result = vectorNorm(buf, myTypedeArray.length);

        console.log(`result = ${result}`);

        Module._free(buf);  // no leaks!
    });
</script>

<script src="malloc_testing.js"></script>

</body>
Enter fullscreen mode Exit fullscreen mode

Here, I first create a JavaScript typed array with float64 continuous view of memory.

After that, I create an empty buffer on the heap inside WASM memory by calling to _malloc that we exported when we compiled our C-file. It returns a pointer buf to the allocated segment of memory, which in JavaScript code is treated simply as a number (very "safe", eh?).

Next step is to fill allocated memory with something. I use Module.HEAPF64.set(myTypedArray, buf / myTypedArray.BYTES_PER_ELEMENT) that takes two arguments: my array, and a pointer to the buffer. Note the alignment! The pointer must count by 8-bytes. It actually took me more than an hour to figure it out since Empscipten API docs are quite, hm, emscryptic on this point. Thanks to ChatGPT and this post.

To see how it works, we can replace the call to HEAPF64.set by manual allocation. I came up with something like this (don't do it anywhere near production!):

function setMemoryManually(myArray, ptr) {
    for (const x of myArray) {
        Module.setValue(ptr, x, 'double');
        ptr += myArray.BYTES_PER_ELEMENT;
    }
}
Enter fullscreen mode Exit fullscreen mode

It looks ugly, but works. Low-level function Module.setValue(ptr, value, 'double') can be used to manually set a value at the address pointed by ptr. In this case, no tricks. The pointer is incremented by BYTES_PER_ELEMENT = 8 for double. So now I can write something like setMemoryManually(myTypedArray, buf) in my JavaScript code, and it will fill the buffer with the content of myTypedArray.

When all memory is set, we can call our C-function from JavaScript. I prefer to wrap it up first.

const vectorNorm = Module.cwrap('vector_norm', // no underscore
                                'number',  // return type 
                                ['number', 'number']); // param types;
Enter fullscreen mode Exit fullscreen mode

We tell cwrap that we return a number, and we pass a couple of number values. Yes, the pointer to the buffer of the allocated memory is passed as a number (looks very "safe" and "portable", eh?). So we can just call vectorNorm from our script.

const result = vectorNorm(buf, myTypedArray.length);
Enter fullscreen mode Exit fullscreen mode

Last step. Open a browser, serve our http-web page from a local host (I just run python -m http.server)

http://localhost:8000/wasm_testing/malloc_testing.html
Enter fullscreen mode Exit fullscreen mode

hard reload, press Run, and here we go

received: n = 5
p[0] = 0.000
p[1] = 1.000
p[2] = 2.000
p[3] = 3.000
p[4] = 5.000
result = 6.244997998398398
Enter fullscreen mode Exit fullscreen mode

In summary...

Playing with low-level stuff is fun, but I won't use it anywhere in productionable code. Well, at least without considerable experience and understanding of the Emscripten code base.

Top comments (0)