<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MAI</title>
    <description>The latest articles on DEV Community by MAI (@miningmai).</description>
    <link>https://dev.to/miningmai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3979096%2F4eb16c44-032c-4439-abb1-8ed5fd78bd69.png</url>
      <title>DEV Community: MAI</title>
      <link>https://dev.to/miningmai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/miningmai"/>
    <language>en</language>
    <item>
      <title>I built a browser AI client that runs 9 LLMs on your GPU — no install, no cloud</title>
      <dc:creator>MAI</dc:creator>
      <pubDate>Thu, 11 Jun 2026 08:58:26 +0000</pubDate>
      <link>https://dev.to/miningmai/i-built-a-browser-ai-client-that-runs-9-llms-on-your-gpu-no-install-no-cloud-3mm</link>
      <guid>https://dev.to/miningmai/i-built-a-browser-ai-client-that-runs-9-llms-on-your-gpu-no-install-no-cloud-3mm</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcq9l2zzljk7xpkxbpjau.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcq9l2zzljk7xpkxbpjau.jpg" alt="MAI AI Client — browser AI mining demo running on GPU" width="800" height="420"&gt;&lt;/a&gt;A few months ago I asked myself: &lt;em&gt;what if an AI model could run entirely in the browser — on the user's own GPU, with zero installation?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Today that's live at &lt;a href="https://miningmai.com/mining" rel="noopener noreferrer"&gt;miningmai.com/mining&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;Open Chrome, pick a model, click Start — and your GPU starts running real AI inference locally. No backend. No API calls. No data leaves your machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The tech stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;WebGPU&lt;/strong&gt; — runs AI shader programs directly on the GPU via browser API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ONNX Runtime Web&lt;/strong&gt; — executes quantized neural networks with WebGPU backend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WebAssembly&lt;/strong&gt; — CPU fallback when WebGPU isn't available&lt;/li&gt;
&lt;li&gt;Pure &lt;strong&gt;JavaScript&lt;/strong&gt; — zero server-side code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How inference works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Load ONNX model with WebGPU backend&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ort&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InferenceSession&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;executionProviders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;webgpu&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Run inference — everything stays on device&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every token is generated as a GPU shader operation. The browser streams output back to the UI in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  9 models supported
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Highlights&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E4B&lt;/td&gt;
&lt;td&gt;~2.5 GB&lt;/td&gt;
&lt;td&gt;Google 2026, multimodal, sees images&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3 4B&lt;/td&gt;
&lt;td&gt;~2.5 GB&lt;/td&gt;
&lt;td&gt;Built-in thinking mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Phi-4 Mini 3.8B&lt;/td&gt;
&lt;td&gt;~3.4 GB&lt;/td&gt;
&lt;td&gt;Microsoft, 128K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.2 3B&lt;/td&gt;
&lt;td&gt;~2.4 GB&lt;/td&gt;
&lt;td&gt;Meta, 128K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SmolLM2 1.7B&lt;/td&gt;
&lt;td&gt;~1 GB&lt;/td&gt;
&lt;td&gt;Most stable, works on 2GB VRAM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek R1 1.5B&lt;/td&gt;
&lt;td&gt;~1.3 GB&lt;/td&gt;
&lt;td&gt;Reasoning mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3 0.6B&lt;/td&gt;
&lt;td&gt;~500 MB&lt;/td&gt;
&lt;td&gt;Fast, thinking mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SmolLM2 360M&lt;/td&gt;
&lt;td&gt;~260 MB&lt;/td&gt;
&lt;td&gt;Web3 Detective — on-chain analysis via Helius API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 E2B&lt;/td&gt;
&lt;td&gt;~2 GB&lt;/td&gt;
&lt;td&gt;Lighter multimodal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All models are open-source — Apache 2.0 / MIT / Llama Community.&lt;/p&gt;

&lt;h2&gt;
  
  
  Special features
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Vision AI&lt;/strong&gt; — attach an image and ask questions about it (Gemma 4)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thinking mode&lt;/strong&gt; — watch the model reason step by step before answering (Qwen3, DeepSeek R1)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web3 Detective&lt;/strong&gt; — paste any Solana wallet or token address, &lt;br&gt;
get an on-chain analysis powered by Helius API + SmolLM2 360M&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto hardware detection&lt;/strong&gt; — detects your GPU, VRAM, RAM and recommends the best model automatically&lt;/p&gt;

&lt;h2&gt;
  
  
  The privacy angle
&lt;/h2&gt;

&lt;p&gt;Everything runs locally. Prompts never leave the device. The only network request is the one-time model download — after that the client works fully offline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;This is a demo of &lt;a href="https://miningmai.com" rel="noopener noreferrer"&gt;MAI Network&lt;/a&gt; — a DePIN project where miners share GPU compute to process AI inference requests and earn tokens. The browser client proves the concept works on real hardware before the full desktop client ships.&lt;/p&gt;

&lt;p&gt;Right now: one GPU in a browser tab.&lt;br&gt;&lt;br&gt;
In production: thousands of nodes processing AI requests in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;👉 &lt;a href="https://miningmai.com/mining" rel="noopener noreferrer"&gt;miningmai.com/mining&lt;/a&gt; — Chrome 113+ on desktop required&lt;/p&gt;

&lt;p&gt;Curious what GPU speed you get — drop your tokens/sec in the comments 👇&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
