<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: gen</title>
    <description>The latest articles on DEV Community by gen (@prospectorlabs).</description>
    <link>https://dev.to/prospectorlabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938157%2F509120e2-78f1-4ec3-87bb-3a6fcdde6532.png</url>
      <title>DEV Community: gen</title>
      <link>https://dev.to/prospectorlabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prospectorlabs"/>
    <language>en</language>
    <item>
      <title>I hid an entire webpage inside a cat face</title>
      <dc:creator>gen</dc:creator>
      <pubDate>Mon, 18 May 2026 13:46:38 +0000</pubDate>
      <link>https://dev.to/prospectorlabs/i-hid-an-entire-webpage-inside-a-cat-face-5gc9</link>
      <guid>https://dev.to/prospectorlabs/i-hid-an-entire-webpage-inside-a-cat-face-5gc9</guid>
      <description>&lt;p&gt;The source of this page is just a cat face:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://nothing-to-see-here.surge.sh/" rel="noopener noreferrer"&gt;https://nothing-to-see-here.surge.sh/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;View source. You'll see (=･ω･=) and nothing else.&lt;br&gt;
But the page runs. Rainbow animation, layout, everything.&lt;/p&gt;

&lt;p&gt;The entire JavaScript is encoded as invisible Unicode &lt;br&gt;
Variation Selectors attached to the cat emoji.&lt;/p&gt;

&lt;p&gt;Unicode VS (U+FE00–FE0F and U+E0100–E01EF) map &lt;br&gt;
precisely to 256 byte values. Any byte sequence can &lt;br&gt;
ride inside normal text, invisible to readers, &lt;br&gt;
surviving copy-paste across Slack, X, LINE, iMessage.&lt;/p&gt;

&lt;p&gt;That's subtext.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://prospectorlabs.dev/subtext" rel="noopener noreferrer"&gt;https://prospectorlabs.dev/subtext&lt;/a&gt;&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>unicode</category>
      <category>webdev</category>
      <category>showdev</category>
    </item>
    <item>
      <title>267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE</title>
      <dc:creator>gen</dc:creator>
      <pubDate>Mon, 18 May 2026 12:56:14 +0000</pubDate>
      <link>https://dev.to/prospectorlabs/267-toks-local-inference-on-rtx-5090-llamacpp-mtp-qwen3-35b-a3b-moe-2m6p</link>
      <guid>https://dev.to/prospectorlabs/267-toks-local-inference-on-rtx-5090-llamacpp-mtp-qwen3-35b-a3b-moe-2m6p</guid>
      <description>&lt;p&gt;Been running Qwen3-35B-A3B (MoE) with llama.cpp's Multi-Token Prediction &lt;br&gt;
(MTP / speculative decoding) on an RTX 5090 under WSL2. Results surprised me:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ollama stock (35B MoE)&lt;/td&gt;
&lt;td&gt;171 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;27B Dense + MTP&lt;/td&gt;
&lt;td&gt;104 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;35B MoE + MTP&lt;/td&gt;
&lt;td&gt;267 tok/s  ← this&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For context: Claude Haiku runs ~150 tok/s via API, billed at $150/MTok.&lt;br&gt;
This setup runs on electricity only.&lt;/p&gt;

&lt;p&gt;The interesting finding is that MoE and speculative decoding have unusual &lt;br&gt;
synergy. With a dense model, MTP gave a modest speedup (or none). &lt;br&gt;
With MoE, it nearly doubled throughput.&lt;/p&gt;

&lt;p&gt;My hypothesis: MoE's sparse activation pattern leaves compute headroom that &lt;br&gt;
speculative decoding can exploit. The draft tokens are cheap to verify because &lt;br&gt;
most experts stay inactive during verification passes.&lt;/p&gt;

&lt;p&gt;Setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTX 5090, WSL2 (Ubuntu 24)&lt;/li&gt;
&lt;li&gt;llama.cpp with MTP draft, n-max 2&lt;/li&gt;
&lt;li&gt;Qwen3-35B-A3B-Instruct Q4_K_XL&lt;/li&gt;
&lt;li&gt;ctx 65536, OpenAI-compatible API on localhost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to share the exact llama-server launch flags if anyone wants to reproduce.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>machinelearning</category>
      <category>llama</category>
      <category>gpu</category>
    </item>
  </channel>
</rss>
