I built a browser AI client that runs 9 LLMs on your GPU — no install, no cloud

#webdev #javascript #ai #opensource

A few months ago I asked myself: what if an AI model could run entirely in the browser — on the user's own GPU, with zero installation?

Today that's live at miningmai.com/mining.

What it does

Open Chrome, pick a model, click Start — and your GPU starts running real AI inference locally. No backend. No API calls. No data leaves your machine.

The tech stack

WebGPU — runs AI shader programs directly on the GPU via browser API
ONNX Runtime Web — executes quantized neural networks with WebGPU backend
WebAssembly — CPU fallback when WebGPU isn't available
Pure JavaScript — zero server-side code

How inference works

// Load ONNX model with WebGPU backend
const session = await ort.InferenceSession.create(modelUrl, {
  executionProviders: ['webgpu'],
});

// Run inference — everything stays on device
const output = await session.run(inputs);

Every token is generated as a GPU shader operation. The browser streams output back to the UI in real time.

9 models supported

Model	Size	Highlights
Gemma 4 E4B	~2.5 GB	Google 2026, multimodal, sees images
Qwen3 4B	~2.5 GB	Built-in thinking mode
Phi-4 Mini 3.8B	~3.4 GB	Microsoft, 128K context
Llama 3.2 3B	~2.4 GB	Meta, 128K context
SmolLM2 1.7B	~1 GB	Most stable, works on 2GB VRAM
DeepSeek R1 1.5B	~1.3 GB	Reasoning mode
Qwen3 0.6B	~500 MB	Fast, thinking mode
SmolLM2 360M	~260 MB	Web3 Detective — on-chain analysis via Helius API
Gemma 4 E2B	~2 GB	Lighter multimodal

All models are open-source — Apache 2.0 / MIT / Llama Community.

Special features

Vision AI — attach an image and ask questions about it (Gemma 4)

Thinking mode — watch the model reason step by step before answering (Qwen3, DeepSeek R1)

Web3 Detective — paste any Solana wallet or token address,
get an on-chain analysis powered by Helius API + SmolLM2 360M

Auto hardware detection — detects your GPU, VRAM, RAM and recommends the best model automatically

The privacy angle

Everything runs locally. Prompts never leave the device. The only network request is the one-time model download — after that the client works fully offline.

Why I built this

This is a demo of MAI Network — a DePIN project where miners share GPU compute to process AI inference requests and earn tokens. The browser client proves the concept works on real hardware before the full desktop client ships.

Right now: one GPU in a browser tab.

In production: thousands of nodes processing AI requests in parallel.