Bence Rácz

Posted on Sep 29

Rust + WebAssembly Performance: JavaScript vs. wasm-bindgen vs. Raw WASM (with SIMD)

#rust #webassembly #javascript #performance

Rust + WebAssembly Performance: Pure JS vs. wasm-bindgen vs. Raw WASM with SIMD

When writing code for the web, JavaScript is the default choice: lightweight, interpreted, and heavily optimized by modern engines such as V8 (Chrome, Node.js) and SpiderMonkey (Firefox). But for compute-heavy tasks, JS may not always deliver the performance we need.

Rust is memory-safe, fast, and integrates well with WebAssembly (WASM) (via wasm-bindgen and wasm-pack). WebAssembly itself is designed for high-performance, portable, and secure execution inside browsers.

Introduction

In this article, I’ll benchmark four approaches to solving the same problems:

Pure JavaScript
Rust compiled to WASM with wasm-bindgen
Raw WASM exports (extern "C")
Raw WASM with SIMD instructions

We’ll measure them on two tasks:

Array modification: Add +2.0 to every element. > This highlights memory management and iteration speed.
Fibonacci calculation: Iterative version. > Recursion was intentionally excluded because it performs more slowly.

Explanation

wasm-bindgen

Library + tool that lets Rust talk to JavaScript (and vice versa).
It servs as the bridge between Rust and JS.

wasm-pack

A build tool (a CLI) that uses wasm-bindgen under the hood.
Automates the whole process: compiling Rust to WebAssembly, running wasm-bindgen, packaging everything as an npm package, and making it ready to publish or consume from a JS project.
It ensures that the correct target (wasm32-unknown-unknown) is used.
Creating package.json.
Running tests in headless browsers.
Building with release optimizations.
Think of it as the workflow manager for Rust + WASM projects.

Float32Array

The difference between a Float32Array and basic JavaScript array mainly comes down to type, memory, and performance.
A typed array that can only store 32-bit floating-point numbers.
Stores numbers in contiguous memory, meaning all numbers are tightly packed as 32-bit floats. This makes it faster for numeric computations and more memory-efficient.
Ideal for WebGL, audio processing, or numerical calculations.

Implementations

1. Pure JavaScript

function pure_js_modify_array(floatArray) {
  for (let i = 0; i < floatArray.length; i++) {
    floatArray[i] += 2.0;
  }
}

function pure_js_fibon(n) {
  if (n <= 1) return n;
  let a = 0, b = 1;
  for (let i = 2; i <= n; i++) {
    [a, b] = [b, a + b];
  }
  return b;
}

2. Wasm-bindgen crate

We can create a Float32Array in JavaScript, which is highly efficient and can be passed to Rust:

const floatArray = new Float32Array(len);

In Rust, using wasm-bindgen, we can receive this array as a slice (&[f32]). However, this copies the data from JavaScript memory into Rust memory. This overhead is negligible for small arrays but can become significant for large arrays or performance-critical workloads.

use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn wasm_modify_array(arr : &[f32]) -> Vec<f32> {
    arr.iter().map(|&x| x+2.0f32).collect()
}

#[wasm_bindgen]
pub fn wasm_fibon(n:u32) -> u64{
    match n {
        0 => 0,
        1 => 1,
        _ => {
            let mut a  = 0;
            let mut b = 1;

            for _ in 2..=n {
                let tmp = a + b;
                a = b;
                b = tmp;
            }
            b
        }
    }
}

3. Raw Wasm

This section demonstrates a low-level, raw WebAssembly approach to modifying arrays in Rust, including both a scalar version and a SIMD-accelerated version. In Rust, the functions use raw pointers (*mut f32) and slice conversion (from_raw_parts_mut) to manipulate memory directly.

Note: To enable SIMD, RUSTFLAGS="-C target-feature=+simd128" is required at compile time.

Note: unsafe blocks are required because this approach bypasses Rust’s usual memory safety guarantees.

Note: Not all browsers support SIMD by default, though support is widespread in modern versions of major browsers like Chrome, Firefox, Edge, and Safari. Older browser versions or certain less common browsers may lack support.

#[unsafe(no_mangle)]
pub unsafe extern "C" fn pure_wasm_modify_array(ptr: *mut f32, len:usize ) {
    let slice: &mut [f32] = unsafe { std::slice::from_raw_parts_mut(ptr, len) }; 

    for item in slice.iter_mut() {
        *item += 2.0f32
    }
}

#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;

#[cfg(target_arch = "wasm32")]
#[target_feature(enable = "simd128")]
#[unsafe(no_mangle)]
pub unsafe fn pure_wasm_modify_array_simd(ptr: *mut f32, len: usize) {
    let slice = unsafe { std::slice::from_raw_parts_mut(ptr, len)};
    let chunks = len & !3;

    for i in (0..chunks).step_by(4) {
        let v = unsafe { v128_load(slice.as_ptr().add(i) as *const v128) };
        let two = f32x4_splat(2.0);
        let res = f32x4_add(v, two);
        unsafe { v128_store(slice.as_mut_ptr().add(i) as *mut v128, res) };
    }

    for i in chunks..len {
        unsafe { *slice.get_unchecked_mut(i) += 2.0f32 };
    }
}


#[unsafe(no_mangle)]
pub unsafe extern "C" fn pure_wasm_fibon(n:u32) -> u64{
    match n {
        0 => 0,
        1 => 1,
        _ => {
            let mut a = 0;
            let mut b = 1;

            for _ in 2..=n {
                let tmp = a + b;
                a = b;
                b = tmp;
            }
            b
        }
    }
}

Results

These results are plotted in the graph, illustrating how raw WASM, especially when combined with SIMD, consistently outperforms higher-level interfaces for computationally intensive tasks.

Figure 1: The average execution time, calculated from 1,000 iterations across 50 runs, shows that pure WASM achieved the shortest execution time.

Note: Numbers for Fibonacci are very small, so these differences are not meaningful.

Approach Task Avg (ms) Min (ms) Max (ms)

Pure JS Modify Array 1.403 1.341 1.643

wasm-bindgen Modify Array 1.623 1.550 1.850

Raw WASM Modify Array 0.353 0.353 0.357

Raw WASM (SIMD) Modify Array 0.231 0.230 0.233

Pure JS Fibonacci 0.00120 0.00060 0.00365

wasm-bindgen Fibonacci 0.00021 0.00015 0.00030

Raw WASM Fibonacci 0.00019 0.00015 0.00025

Table 1: Low-level WASM implementations provide both high speed and stable execution times across repeated runs.

Approach	Task	Avg (ms)	Min (ms)	Max (ms)
Pure JS	Modify Array	1.403	1.341	1.643
wasm-bindgen	Modify Array	1.623	1.550	1.850
Raw WASM	Modify Array	0.353	0.353	0.357
Raw WASM (SIMD)	Modify Array	0.231	0.230	0.233
Pure JS	Fibonacci	0.00120	0.00060	0.00365
wasm-bindgen	Fibonacci	0.00021	0.00015	0.00030
Raw WASM	Fibonacci	0.00019	0.00015	0.00025

Discussion

In array modification, Raw WASM (0.353 ms) is ~$4\times$ faster than Pure JS (1.403 ms). With SIMD (0.231 ms), performance improves to ~$6\times$ faster then Pure JS. For the Fibonacci calculation, all approaches complete extremely quickly, with differences being practically negligible.

Conclusion

Note: These conclusions are based on a small-scale benchmark and should be interpreted cautiously.

Use pure JavaScript for simplicity when performance is “good enough.”
Use wasm-bindgen for convenience when integrating Rust logic into JS-heavy projects.
Raw WASM exports offer higher performance, particularly for large or compute-intensive operations, but require more careful memory management and lower-level coding.
SIMD instructions further improve performance for workloads that are highly parallelizable.

Top comments (3)

Tristan Hoy • Oct 22

We noticed exactly the same thing! The underlying cause is all the dynamic allocation marshalling code that wasm-bindgen generates on both sides.

Our results were exactly the same as yours: JS + wasm-bindgen was only marginally faster than pure JS.

This basically nuked our project, so we built an alternative to wasm-bindgen that's built on top of the same direct buffer mutation approach you've used here.

I'd be interested to know if this helps you solve your problem at all news.ycombinator.com/item?id=45664341

Alexander Girke • Sep 29

Interesting comparison. Can you tell what made the wasm-bindgen solution perform so bad? Is it just the data copying or is the tool doing some more magic that makes the performance degrade?

Bence Rácz • Sep 30

Thanks for the question!

I have a use case where my JS code creates a Float32Array of length 1M and then passes it to a wasm-compiled Rust function, which expects a &[f32] slice. If I do this the straightforward way using wasm-bindgen, I run into unnecessary memory usage and copying:
1. JS creates the Float32Array.
2. JS calls the wasm-bindgen-generated binding function and passes that array.
3. The binding wrapper allocates a new buffer in wasm memory and copies the contents of the array. Now I have roughly double the memory in use.
4. The underlying wasm function runs using the copied buffer.
5. The binding function deallocates the wasm memory copy when done.

The performance hit comes from both allocating the extra buffer and copying the data.