Todd Tanner

Posted on May 23 • Edited on Jun 27

A Neural Network Engine in C# That Runs in Your Browser - No ONNX Runtime, No JavaScript Bridge, No Native Binaries

#csharp #machinelearning #webassembly #blazor

Eight months ago, the creator of ILGPU told me that supporting Blazor WebAssembly as a backend would be too difficult.

Today I shipped SpawnDev.ILGPU.ML 4.0.0-preview.4 to NuGet. It runs neural networks in your browser, on your laptop, and on your server - from a single C# codebase. Six backends: WebGPU, WebGL, WebAssembly, CUDA, OpenCL, and CPU. No ONNX Runtime. No JavaScript bridge. No native binaries to install. Just C# that gets transpiled to whatever shader language the target needs.

This article is about how that happened, what works today, and an honest ask for the help that would let me keep going at full speed.

What works today, in your browser, right now

The library ships five inference pipelines that have been validated end-to-end on every backend. Each of these images is a screenshot from the live demo at lostbeard.github.io/SpawnDev.ILGPU.ML - the model output rendered directly from a GPU buffer to an HTML <canvas> via the library's ICanvasRenderer. No PNG encoding step, no base64 data URL, no host readback of pixel data.

Image classification - SqueezeNet 1.1

Drop in an image, get top-K labels and confidences. The ONNX file is loaded from HuggingFace's CDN, cached in the browser's OPFS for subsequent visits, parsed into a graph, and dispatched to whatever backend the page is running on. No server. No upload. The image never leaves the device.

Monocular depth estimation - Depth Anything V2 Small (95MB)

A 95MB model. Streams the weights one tensor at a time so it doesn't blow up the WASM heap. Output is upscaled back to the source image's exact aspect ratio via a GPU bilinear resize kernel, then run through a piecewise-linear colormap kernel (plasma / viridis / inferno / grayscale) - palette switch is one accelerator dispatch, no re-inference.

Neural style transfer - Mosaic

The classic Gatys-style transfer model, running entirely client-side. The result is rendered straight to a <canvas> via the library's GPU-direct renderer.

Background removal - RMBG-1.4

Salient-object segmentation in the browser. The mask is computed on the accelerator, applied to the source image's alpha channel on the accelerator, and composited (transparent / white / blur background options) without any CPU loop ever touching pixel data.

3x super-resolution - ESPCN (tile-based)

This one was a serious refactor. The published ESPCN model takes a fixed 224x224 luminance input - a naive implementation would shrink the source to 224x224, super-res it, and call it done (lossy and grayscale). My pipeline now tiles the source into overlapping 224x224 patches, runs each through the model, accumulates them into a destination luminance plane on the accelerator with weighted averaging in the overlap regions, then combines the result with bilinear-upsampled Cb/Cr from the original RGBA. Full source resolution, full color, source aspect ratio preserved.

How it works - C# in, GPU shaders out

The library is built on top of SpawnDev.ILGPU, my fork of ILGPU that adds three browser GPU backends to the existing CUDA / OpenCL / CPU ones.

ILGPU transpiles .NET CIL into GPU shader code. The browser backends I added compile that CIL into:

WGSL (WebGPU Shading Language) - the modern GPU compute path. Texture copies straight to canvas.
GLSL (OpenGL Shading Language) - using WebGL2 Transform Feedback for compute. Works on every browser back to ~2017.
WebAssembly binary - SIMD + threads, with multi-worker dispatch via SharedArrayBuffer.

A kernel like this:

private static void DoubleKernel(Index1D idx,
    TensorView<float> input, TensorView<float> output)
{
    int w = idx % input.D3;
    int h = (idx / input.D3) % input.D2;
    int c = (idx / (input.D3 * input.D2)) % input.D1;
    int n = idx / (input.D3 * input.D2 * input.D1);
    output.Set4D(n, c, h, w, input.Get4D(n, c, h, w) * 2f);
}

...becomes a WGSL compute shader in WebGPU, a GLSL vertex/fragment shader using Transform Feedback in WebGL, a Wasm function dispatched across web workers, a PTX kernel on CUDA, an OpenCL kernel on AMD/Intel desktop, or a parallel-for on the CPU. One C# function, six target backends. Picked at runtime.

The TensorView<float> parameter is a blittable struct that ILGPU packs into the kernel's parameter buffer alongside the standard Index1D thread coordinate. Its D0..D3 fields carry the tensor shape inline - the kernel reads dimensions from the struct rather than taking H and W as separate scalar parameters. This matters because shape-management was previously the noisiest, error-prone part of kernel authoring.

The Tensor API

This week's release shipped a Transformers.js / ONNX-Runtime style API surface in idiomatic C#:

using var session = await InferenceSession.CreateFromFileAsync(
    accelerator, http, "models/squeezenet/model.onnx");

// Allocate the input as an OwnedTensor - wraps a fresh GPU buffer.
using var input = OwnedTensor<float>.FromHost(
    accelerator, pixels, new[] { 1, 3, 224, 224 });

// Transformers.js-style call. Outputs come back as an OwnedTensorMap<float> -
// each output tensor lives in its own freshly-allocated GPU buffer, independent
// of the session's internal pool. The `using` disposes every output in one go.
using var outputs = await session.RunOwnedAsync(new Dictionary<string, Tensor<float>>
{
    [session.InputNames[0]] = input,
});

var logits = outputs[session.OutputNames[0]];   // OwnedTensor<float>
var hostLogits = await logits.ToHostAsync();    // copy back to CPU only when needed

There are three tensor types, mirroring the split ILGPU itself uses between MemoryBuffer<T> (host-side, lifetime-managing class) and ArrayView<T> (kernel-passable struct):

Tensor<T> - host-side, generic over T : unmanaged. Zero-copy reshape / slice / sub-tensor.
OwnedTensor<T> - IDisposable, owns a MemoryBuffer1D<T>. What pipelines return. Implicit conversions to Tensor<T> and TensorView<T> mean you never have to type .AsTensor or .View at a call site.
TensorView<T> - blittable struct, passes directly into ILGPU kernels.

This is the same conceptual API as Transformers.js and ONNX Runtime - the same patterns work, the same mental model carries over - implemented in a language with real type-safe generics and deterministic disposal.

Why I'm doing this

Half of the answer is technical curiosity. The other half is that the current ML-in-the-browser landscape is dominated by ONNX Runtime Web, which has a fundamental WebGPU device-sharing bug that's been ignored for six months (microsoft/onnxruntime#26107). That bug is a wall for anyone trying to ship more than one model in a single browser session. It pushed me from "I wonder if I could do this" to "I'm doing this."

The deeper motivation: when neural networks run on the user's device, the user's data stays on the user's device. No upload to your servers. No "we promise we won't train on your data." No data plane at all. The user runs the model on their hardware against their data, period.

That's what the SpawnDev stack is for. Blazor WebAssembly + WebGPU + the ML library + WebRTC peer-to-peer model delivery via SpawnDev.WebTorrent. A Progressive Web App stack where the entire AI workload lives in the user's browser, the model weights arrive over BitTorrent from other users running the same app, and nothing ever touches a centralized server. That's the destination. The library this article is announcing is one foundational layer.

Where I am with funding - and what I need

I built this library on a $20/month budget.

I am one person. I have a small crew that helps when budget allows. When the budget allows it, peak output across the entire SpawnDev stack (six libraries, hundreds of operators, multiple test suites, the WebRTC + WebTorrent + BlazorJS stack underneath) looks like 410 commits in a single day. When it doesn't, work slows to whatever evenings I can spare around a full-time job.

I'm asking for $200/month total in GitHub Sponsorships to put the full crew back on the ship.

That's the gap between this preview and the next ten:

Every remaining operator family migrated to the new Tensor API
The 11 other inference pipelines verified end-to-end on every backend
FP16 attention + Flash Attention on WebGPU
Llama and Phi-4 LLM inference in the browser
Full text-to-image diffusion through SD-Turbo
TripoSR single-image-to-3D
Peer-to-peer distributed compute through SpawnDev.WebTorrent

It is all in flight. The bottleneck is hours, not ideas.

Try it, break it, file bugs

The library is on NuGet:

dotnet add package SpawnDev.ILGPU.ML --prerelease

The source is at github.com/LostBeard/SpawnDev.ILGPU.ML. The live demos at lostbeard.github.io/SpawnDev.ILGPU.ML run entirely in your browser - the page itself is served from GitHub Pages, but the inference happens on your GPU and your data never leaves your machine.

If you're a .NET developer who's looked at the browser ML space and thought "I want this but I can't bring myself to write JavaScript," this is for you. If you're a Blazor developer who needs ML and couldn't make ONNX Runtime Web behave, this is for you. If you've been waiting for someone to prove Blazor WebAssembly can be a serious AI runtime, this is the proof.

Try the demos. File issues from your own models. Star the repo if you want to see this continue.

And if you can sponsor: github.com/sponsors/LostBeard. $5/month is a vote of confidence. $200/month total puts the crew back at warp speed.

🖖🚀

Top comments (1)

VoltageGPU • May 27

It's impressive to see ML running natively in the browser without relying on JS bridges or WebAssembly AOT. I've worked with GPU-accelerated ML in constrained environments and know how tricky memory and thread management can be—nice to see such a clean abstraction without sacrificing performance. If you ever explore tighter GPU integration, VoltageGPU might be worth a look for low-level control.