A few months ago I asked myself: what if an AI model could run entirely in the browser — on the user's own GPU, with zero installation?
Today that's live at miningmai.com/mining.
What it does
Open Chrome, pick a model, click Start — and your GPU starts running real AI inference locally. No backend. No API calls. No data leaves your machine.
The tech stack
- WebGPU — runs AI shader programs directly on the GPU via browser API
- ONNX Runtime Web — executes quantized neural networks with WebGPU backend
- WebAssembly — CPU fallback when WebGPU isn't available
- Pure JavaScript — zero server-side code
How inference works
// Load ONNX model with WebGPU backend
const session = await ort.InferenceSession.create(modelUrl, {
executionProviders: ['webgpu'],
});
// Run inference — everything stays on device
const output = await session.run(inputs);
Every token is generated as a GPU shader operation. The browser streams output back to the UI in real time.
9 models supported
| Model | Size | Highlights |
|---|---|---|
| Gemma 4 E4B | ~2.5 GB | Google 2026, multimodal, sees images |
| Qwen3 4B | ~2.5 GB | Built-in thinking mode |
| Phi-4 Mini 3.8B | ~3.4 GB | Microsoft, 128K context |
| Llama 3.2 3B | ~2.4 GB | Meta, 128K context |
| SmolLM2 1.7B | ~1 GB | Most stable, works on 2GB VRAM |
| DeepSeek R1 1.5B | ~1.3 GB | Reasoning mode |
| Qwen3 0.6B | ~500 MB | Fast, thinking mode |
| SmolLM2 360M | ~260 MB | Web3 Detective — on-chain analysis via Helius API |
| Gemma 4 E2B | ~2 GB | Lighter multimodal |
All models are open-source — Apache 2.0 / MIT / Llama Community.
Special features
Vision AI — attach an image and ask questions about it (Gemma 4)
Thinking mode — watch the model reason step by step before answering (Qwen3, DeepSeek R1)
Web3 Detective — paste any Solana wallet or token address,
get an on-chain analysis powered by Helius API + SmolLM2 360M
Auto hardware detection — detects your GPU, VRAM, RAM and recommends the best model automatically
The privacy angle
Everything runs locally. Prompts never leave the device. The only network request is the one-time model download — after that the client works fully offline.
Why I built this
This is a demo of MAI Network — a DePIN project where miners share GPU compute to process AI inference requests and earn tokens. The browser client proves the concept works on real hardware before the full desktop client ships.
Right now: one GPU in a browser tab.
In production: thousands of nodes processing AI requests in parallel.
Try it
👉 miningmai.com/mining — Chrome 113+ on desktop required
Curious what GPU speed you get — drop your tokens/sec in the comments 👇
Top comments (1)
Happy to answer any questions about the WebGPU + ONNX
implementation — drop them below 👇
If you try it, curious what GPU speed (tokens/sec) you get!