<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sven Herrmann</title>
    <description>The latest articles on DEV Community by Sven Herrmann (@thatscalaguy).</description>
    <link>https://dev.to/thatscalaguy</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F279658%2F9619cd45-8be1-4fa7-8820-afc87dbba14f.png</url>
      <title>DEV Community: Sven Herrmann</title>
      <link>https://dev.to/thatscalaguy</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thatscalaguy"/>
    <language>en</language>
    <item>
      <title>GPGPU.js: Run JavaScript on Your GPU With Zero Shader Knowledge</title>
      <dc:creator>Sven Herrmann</dc:creator>
      <pubDate>Mon, 25 May 2026 08:15:44 +0000</pubDate>
      <link>https://dev.to/thatscalaguy/gpgpujs-run-javascript-on-your-gpu-with-zero-shader-knowledge-569n</link>
      <guid>https://dev.to/thatscalaguy/gpgpujs-run-javascript-on-your-gpu-with-zero-shader-knowledge-569n</guid>
      <description>&lt;p&gt;JavaScript has a parallelism problem. The language is single-threaded by design, and while Web Workers help, they top out at a handful of cores. Meanwhile, the most powerful parallel processor in your machine — the GPU, with its thousands of cores — has been almost completely off-limits to web developers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/WebGPU_API" rel="noopener noreferrer"&gt;WebGPU&lt;/a&gt; changes that. It exposes the GPU to the browser for general-purpose compute, not just graphics. The catch? Using it directly means writing WGSL shaders, managing buffers and bind groups, handling device initialization, and orchestrating async data transfers. That's a lot of boilerplate for what is conceptually just "run this function over an array, fast."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/ThatScalaGuy/GPGPU.js" rel="noopener noreferrer"&gt;GPGPU.js&lt;/a&gt;&lt;/strong&gt; removes that boilerplate. You write plain JavaScript; it runs on the GPU.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@thatscalaguy/gpgpu.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;doubled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// [2, 4, 6, 8]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No shaders. No buffers. No device setup. That &lt;code&gt;x =&amp;gt; x * 2&lt;/code&gt; arrow function is parsed and compiled to a WGSL compute shader, dispatched across the GPU, and the result comes back as a typed array.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🎮 &lt;strong&gt;Try it right now in your browser&lt;/strong&gt; — no install needed: &lt;a href="https://thatscalaguy.github.io/GPGPU.js/" rel="noopener noreferrer"&gt;thatscalaguy.github.io/GPGPU.js&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The API in 30 seconds
&lt;/h2&gt;

&lt;p&gt;The whole point is that the surface area is tiny. Here's most of it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@thatscalaguy/gpgpu.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Element-wise math (arrays or array + scalar)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;   &lt;span class="c1"&gt;// [5, 7, 9]&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;multiply&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;     &lt;span class="c1"&gt;// [10, 20, 30]&lt;/span&gt;

&lt;span class="c1"&gt;// Map with a JS arrow function — compiled to a GPU shader&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Reductions&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;        &lt;span class="c1"&gt;// 15&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;     &lt;span class="c1"&gt;// 9&lt;/span&gt;

&lt;span class="c1"&gt;// Sort (GPU bitonic sort) and prefix sum&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Matrix multiply (tiled, uses shared memory)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matmul&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;matA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;matB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;rowsA&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;colsA&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;colsB&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is &lt;code&gt;async&lt;/code&gt; because the GPU works asynchronously — you &lt;code&gt;await&lt;/code&gt; the result and get a &lt;code&gt;Float32Array&lt;/code&gt; back.&lt;/p&gt;

&lt;h2&gt;
  
  
  Arrow functions become shaders
&lt;/h2&gt;

&lt;p&gt;The magic trick is the codegen. When you pass &lt;code&gt;x =&amp;gt; x * 2 + 1&lt;/code&gt;, GPGPU.js:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Parses the function&lt;/strong&gt; and builds an intermediate representation of the expression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Emits WGSL&lt;/strong&gt; from that IR — &lt;code&gt;x * 2 + 1&lt;/code&gt; becomes &lt;code&gt;output[idx] = (x * 2.0) + 1.0;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compiles and dispatches&lt;/strong&gt; the shader through WebGPU across thousands of cores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Returns&lt;/strong&gt; the result as a typed array.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A useful subset of JavaScript is supported inside expressions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Arithmetic: &lt;code&gt;+ - * / %&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Comparisons: &lt;code&gt;&amp;lt; &amp;gt; &amp;lt;= &amp;gt;= == !=&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Ternary: &lt;code&gt;a &amp;gt; 0 ? a : -a&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Math functions: &lt;code&gt;Math.abs&lt;/code&gt;, &lt;code&gt;Math.sqrt&lt;/code&gt;, &lt;code&gt;Math.pow&lt;/code&gt;, &lt;code&gt;Math.min&lt;/code&gt;, &lt;code&gt;Math.max&lt;/code&gt;, &lt;code&gt;Math.floor&lt;/code&gt;, &lt;code&gt;Math.ceil&lt;/code&gt;, &lt;code&gt;Math.sin&lt;/code&gt;, &lt;code&gt;Math.cos&lt;/code&gt;, &lt;code&gt;Math.tan&lt;/code&gt;, &lt;code&gt;Math.exp&lt;/code&gt;, &lt;code&gt;Math.log&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're worried about minifiers mangling your arrow functions in production, you can pass a string expression instead — it compiles to exactly the same shader:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;x * x + 1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// minifier-safe&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pipelines: keep data on the GPU
&lt;/h2&gt;

&lt;p&gt;The expensive part of GPU computing usually isn't the math — it's shuttling data back and forth across the PCIe bus between CPU and GPU. If you naively chain operations, you pay that round-trip every step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ❌ Three CPU↔GPU round-trips&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;total&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pipelines fix this. The data stays resident on the GPU between steps, and you only transfer at the start and end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ✅ One round-trip, data stays on GPU between steps&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For any non-trivial workload, this is the difference between "the GPU is slower than the CPU" and a real speedup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Escape hatch: write your own WGSL
&lt;/h2&gt;

&lt;p&gt;The high-level API covers a lot, but sometimes you need full control. The &lt;code&gt;createKernel&lt;/code&gt; API lets you write raw WGSL while still letting GPGPU.js handle device setup, buffer pooling, and dispatch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kernel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createKernel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;workgroupSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;shader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`
    @group(0) @binding(0) var&amp;lt;storage, read&amp;gt; input0: array&amp;lt;f32&amp;gt;;
    @group(0) @binding(1) var&amp;lt;storage, read_write&amp;gt; output: array&amp;lt;f32&amp;gt;;
    @compute @workgroup_size(64)
    fn main(@builtin(global_invocation_id) gid: vec3u) {
      let idx = gid.x;
      output[idx] = input0[idx] * input0[idx];
    }
  `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;f32&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;f32&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;kernel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  It runs everywhere — even without WebGPU
&lt;/h2&gt;

&lt;p&gt;WebGPU support is good and growing, but it's not universal yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chrome 113+ / Edge 113+&lt;/li&gt;
&lt;li&gt;Firefox 141+ (Windows), 145+ (macOS)&lt;/li&gt;
&lt;li&gt;Safari 18+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GPGPU.js doesn't make you choose. When WebGPU is unavailable, &lt;strong&gt;every operation transparently falls back to a CPU implementation&lt;/strong&gt;. Your code doesn't change; it just runs slower where there's no GPU and faster where there is. That makes it safe to ship in production today without feature-detection branches scattered through your code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built for modern JavaScript projects
&lt;/h2&gt;

&lt;p&gt;A few things that make it pleasant to actually use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript-first&lt;/strong&gt; — full type safety, zero runtime dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tree-shakeable&lt;/strong&gt; — ships both ESM and CJS; import only what you use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No build step required&lt;/strong&gt; — works straight from npm.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @thatscalaguy/gpgpu.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also create isolated instances instead of using the default singleton, and clean them up explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;GPU&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@thatscalaguy/gpgpu.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;myGpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GPU&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// ... use it ...&lt;/span&gt;
&lt;span class="nx"&gt;myGpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;destroy&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// release GPU resources when done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Where this is useful
&lt;/h2&gt;

&lt;p&gt;Anything that's "the same operation over a lot of data" is a candidate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image and signal processing (per-pixel transforms, convolutions)&lt;/li&gt;
&lt;li&gt;Numerical simulations and physics&lt;/li&gt;
&lt;li&gt;Data transforms over large arrays&lt;/li&gt;
&lt;li&gt;Linear algebra — matrix multiplication is built in&lt;/li&gt;
&lt;li&gt;ML inference building blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workload is small or branch-heavy, the CPU may still win — GPU shines when you have thousands to millions of independent elements. As always: measure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The fastest way to get a feel for it is the hosted playground — write an expression, hit run, see it execute on your actual GPU:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://thatscalaguy.github.io/GPGPU.js/" rel="noopener noreferrer"&gt;thatscalaguy.github.io/GPGPU.js&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;And the source, issues, and docs live on GitHub:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/ThatScalaGuy/GPGPU.js" rel="noopener noreferrer"&gt;github.com/ThatScalaGuy/GPGPU.js&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's MIT-licensed and contributions are welcome. If you build something with it, I'd love to hear about it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;GPGPU.js is open source under the MIT license.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webgpu</category>
      <category>javascript</category>
      <category>typescript</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
