DEV Community

MAI
MAI

Posted on

I built a browser AI client that runs 9 LLMs on your GPU — no install, no cloud

MAI AI Client — browser AI mining demo running on GPUA few months ago I asked myself: what if an AI model could run entirely in the browser — on the user's own GPU, with zero installation?

Today that's live at miningmai.com/mining.

What it does

Open Chrome, pick a model, click Start — and your GPU starts running real AI inference locally. No backend. No API calls. No data leaves your machine.

The tech stack

  • WebGPU — runs AI shader programs directly on the GPU via browser API
  • ONNX Runtime Web — executes quantized neural networks with WebGPU backend
  • WebAssembly — CPU fallback when WebGPU isn't available
  • Pure JavaScript — zero server-side code

How inference works

// Load ONNX model with WebGPU backend
const session = await ort.InferenceSession.create(modelUrl, {
  executionProviders: ['webgpu'],
});

// Run inference — everything stays on device
const output = await session.run(inputs);
Enter fullscreen mode Exit fullscreen mode

Every token is generated as a GPU shader operation. The browser streams output back to the UI in real time.

9 models supported

Model Size Highlights
Gemma 4 E4B ~2.5 GB Google 2026, multimodal, sees images
Qwen3 4B ~2.5 GB Built-in thinking mode
Phi-4 Mini 3.8B ~3.4 GB Microsoft, 128K context
Llama 3.2 3B ~2.4 GB Meta, 128K context
SmolLM2 1.7B ~1 GB Most stable, works on 2GB VRAM
DeepSeek R1 1.5B ~1.3 GB Reasoning mode
Qwen3 0.6B ~500 MB Fast, thinking mode
SmolLM2 360M ~260 MB Web3 Detective — on-chain analysis via Helius API
Gemma 4 E2B ~2 GB Lighter multimodal

All models are open-source — Apache 2.0 / MIT / Llama Community.

Special features

Vision AI — attach an image and ask questions about it (Gemma 4)

Thinking mode — watch the model reason step by step before answering (Qwen3, DeepSeek R1)

Web3 Detective — paste any Solana wallet or token address,
get an on-chain analysis powered by Helius API + SmolLM2 360M

Auto hardware detection — detects your GPU, VRAM, RAM and recommends the best model automatically

The privacy angle

Everything runs locally. Prompts never leave the device. The only network request is the one-time model download — after that the client works fully offline.

Why I built this

This is a demo of MAI Network — a DePIN project where miners share GPU compute to process AI inference requests and earn tokens. The browser client proves the concept works on real hardware before the full desktop client ships.

Right now: one GPU in a browser tab.

In production: thousands of nodes processing AI requests in parallel.

Try it

👉 miningmai.com/mining — Chrome 113+ on desktop required

Curious what GPU speed you get — drop your tokens/sec in the comments 👇

Top comments (1)

Collapse
 
miningmai profile image
MAI

Happy to answer any questions about the WebGPU + ONNX
implementation — drop them below 👇

If you try it, curious what GPU speed (tokens/sec) you get!