GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge

#webgpu #javascript #typescript #performance

JavaScript has a parallelism problem. The language is single-threaded by design, and while Web Workers help, they top out at a handful of cores. Meanwhile, the most powerful parallel processor in your machine — the GPU, with its thousands of cores — has been almost completely off-limits to web developers.

WebGPU changes that. It exposes the GPU to the browser for general-purpose compute, not just graphics. The catch? Using it directly means writing WGSL shaders, managing buffers and bind groups, handling device initialization, and orchestrating async data transfers. That's a lot of boilerplate for what is conceptually just "run this function over an array, fast."

GPGPU.js removes that boilerplate. You write plain JavaScript; it runs on the GPU.

import { gpu } from "@thatscalaguy/gpgpu.js";

const doubled = await gpu.map([1, 2, 3, 4], x => x * 2); // [2, 4, 6, 8]

No shaders. No buffers. No device setup. That x => x * 2 arrow function is parsed and compiled to a WGSL compute shader, dispatched across the GPU, and the result comes back as a typed array.

🎮 Try it right now in your browser — no install needed: thatscalaguy.github.io/GPGPU.js

The API in 30 seconds

The whole point is that the surface area is tiny. Here's most of it:

import { gpu } from "@thatscalaguy/gpgpu.js";

// Element-wise math (arrays or array + scalar)
await gpu.add([1, 2, 3], [4, 5, 6]);   // [5, 7, 9]
await gpu.multiply([1, 2, 3], 10);     // [10, 20, 30]

// Map with a JS arrow function — compiled to a GPU shader
await gpu.map(data, x => Math.sqrt(x) + 1);

// Reductions
await gpu.sum([1, 2, 3, 4, 5]);        // 15
await gpu.max([3, 1, 4, 1, 5, 9]);     // 9

// Sort (GPU bitonic sort) and prefix sum
await gpu.sort(data);
await gpu.scan(data);

// Matrix multiply (tiled, uses shared memory)
await gpu.matmul(matA, matB, { rowsA: 64, colsA: 64, colsB: 64 });

Everything is async because the GPU works asynchronously — you await the result and get a Float32Array back.

Arrow functions become shaders

The magic trick is the codegen. When you pass x => x * 2 + 1, GPGPU.js:

Parses the function and builds an intermediate representation of the expression.
Emits WGSL from that IR — x * 2 + 1 becomes output[idx] = (x * 2.0) + 1.0;.
Compiles and dispatches the shader through WebGPU across thousands of cores.
Returns the result as a typed array.

A useful subset of JavaScript is supported inside expressions:

Arithmetic: + - * / %
Comparisons: < > <= >= == !=
Ternary: a > 0 ? a : -a
Math functions: Math.abs, Math.sqrt, Math.pow, Math.min, Math.max, Math.floor, Math.ceil, Math.sin, Math.cos, Math.tan, Math.exp, Math.log

If you're worried about minifiers mangling your arrow functions in production, you can pass a string expression instead — it compiles to exactly the same shader:

await gpu.map(data, "x * x + 1"); // minifier-safe

Pipelines: keep data on the GPU

The expensive part of GPU computing usually isn't the math — it's shuttling data back and forth across the PCIe bus between CPU and GPU. If you naively chain operations, you pay that round-trip every step:

// ❌ Three CPU↔GPU round-trips
let r = await gpu.map(data, x => x * 2);
r = await gpu.map(r, x => x + 1);
const total = await gpu.sum(r);

Pipelines fix this. The data stays resident on the GPU between steps, and you only transfer at the start and end:

// ✅ One round-trip, data stays on GPU between steps
const result = await gpu.pipeline()
  .map(x => x * 2)
  .map(x => x + 1)
  .reduce((a, b) => a + b, 0)
  .run(data);

For any non-trivial workload, this is the difference between "the GPU is slower than the CPU" and a real speedup.

Escape hatch: write your own WGSL

The high-level API covers a lot, but sometimes you need full control. The createKernel API lets you write raw WGSL while still letting GPGPU.js handle device setup, buffer pooling, and dispatch:

const kernel = await gpu.createKernel({
  workgroupSize: 64,
  shader: `
    @group(0) @binding(0) var<storage, read> input0: array<f32>;
    @group(0) @binding(1) var<storage, read_write> output: array<f32>;
    @compute @workgroup_size(64)
    fn main(@builtin(global_invocation_id) gid: vec3u) {
      let idx = gid.x;
      output[idx] = input0[idx] * input0[idx];
    }
  `,
  inputs: [{ type: "f32", size: 1024 }],
  output: { type: "f32", size: 1024 },
});

const result = await kernel.run(inputData);

It runs everywhere — even without WebGPU

WebGPU support is good and growing, but it's not universal yet:

Chrome 113+ / Edge 113+
Firefox 141+ (Windows), 145+ (macOS)
Safari 18+

GPGPU.js doesn't make you choose. When WebGPU is unavailable, every operation transparently falls back to a CPU implementation. Your code doesn't change; it just runs slower where there's no GPU and faster where there is. That makes it safe to ship in production today without feature-detection branches scattered through your code.

Built for modern JavaScript projects

A few things that make it pleasant to actually use:

TypeScript-first — full type safety, zero runtime dependencies.
Tree-shakeable — ships both ESM and CJS; import only what you use.
No build step required — works straight from npm.

npm install @thatscalaguy/gpgpu.js

You can also create isolated instances instead of using the default singleton, and clean them up explicitly:

import { GPU } from "@thatscalaguy/gpgpu.js";

const myGpu = new GPU();
// ... use it ...
myGpu.destroy(); // release GPU resources when done

Where this is useful

Anything that's "the same operation over a lot of data" is a candidate:

Image and signal processing (per-pixel transforms, convolutions)
Numerical simulations and physics
Data transforms over large arrays
Linear algebra — matrix multiplication is built in
ML inference building blocks

If your workload is small or branch-heavy, the CPU may still win — GPU shines when you have thousands to millions of independent elements. As always: measure.