<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Denis</title>
    <description>The latest articles on DEV Community by Denis (@remizovdenis).</description>
    <link>https://dev.to/remizovdenis</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3849988%2F8c52075b-97ad-4d50-ad29-39e86e285d01.jpeg</url>
      <title>DEV Community: Denis</title>
      <link>https://dev.to/remizovdenis</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/remizovdenis"/>
    <language>en</language>
    <item>
      <title>TurboQuant MoE 0.3.0</title>
      <dc:creator>Denis</dc:creator>
      <pubDate>Tue, 31 Mar 2026 17:30:43 +0000</pubDate>
      <link>https://dev.to/remizovdenis/turboquant-moe-030-bee</link>
      <guid>https://dev.to/remizovdenis/turboquant-moe-030-bee</guid>
      <description>&lt;p&gt;Key Features in v0.3.0&lt;/p&gt;

&lt;p&gt;True 3-bit PolarQuant: Physical bit-packing (8x3-bit into 3 bytes) achieving 5.8x-6.0x compression of base KV storage with &amp;lt;0.1% accuracy drop.&lt;br&gt;
Cross-Layer KV Delta (14x Compression): Next-gen backend that stores 3-bit anchor layers and 1-bit signed deltas for intermediate layers.&lt;br&gt;
Speculative KV Prefill: Accelerates prefill phase by 2-3x using 1-bit sketches for fast draft KV generation and verification.&lt;br&gt;
Temporal Expert Fusion: SVD-based merging of rarely-used experts to reclaim 20-30% of MoE weight VRAM with zero quality loss.&lt;br&gt;
Cross-Request Prefix Sharing: Global manager for sharing KV blocks of common prefixes across concurrent requests.&lt;br&gt;
Fast Walsh-Hadamard Transform (FWHT): &lt;br&gt;
O&lt;br&gt;
(&lt;br&gt;
N&lt;br&gt;
log&lt;br&gt;
⁡&lt;br&gt;
N&lt;br&gt;
)rotation for faster quantization on power-of-2 dimensions.&lt;br&gt;
Cryptographic KV Watermarking: HMAC-seeded LSB watermarking of KV scales for attribution and auditing.&lt;/p&gt;

</description>
      <category>developer</category>
      <category>opensource</category>
      <category>llm</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Compressed LLM Memory 8.5x in 2 Hours. Here's How.</title>
      <dc:creator>Denis</dc:creator>
      <pubDate>Sun, 29 Mar 2026 19:54:12 +0000</pubDate>
      <link>https://dev.to/remizovdenis/i-compressed-llm-memory-85x-in-2-hours-heres-how-cp0</link>
      <guid>https://dev.to/remizovdenis/i-compressed-llm-memory-85x-in-2-hours-heres-how-cp0</guid>
      <description>&lt;p&gt;I Compressed LLM Memory 8.5x in 2 Hours. Here's How.&lt;/p&gt;

&lt;p&gt;My name is Denis. I'm 28, built this while running SecuriLayer.&lt;/p&gt;

&lt;p&gt;The Problem&lt;/p&gt;

&lt;p&gt;LLM inference costs too much because of KV cache.&lt;/p&gt;

&lt;p&gt;For example: Mixtral 8x7B with 16k tokens = 256MB just for KV cache.&lt;/p&gt;

&lt;p&gt;That means one GPU can serve 1-2 users. Costs $10k+/month.&lt;/p&gt;

&lt;p&gt;The Solution&lt;/p&gt;

&lt;p&gt;I took Google DeepMind's quantization algorithm and implemented it properly.&lt;/p&gt;

&lt;p&gt;Using orthogonal transforms instead of random rounding.&lt;/p&gt;

&lt;p&gt;Result: 8.5x compression with ZERO quality loss.&lt;/p&gt;

&lt;p&gt;The Numbers&lt;/p&gt;

&lt;p&gt;Before TurboQuant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory: 256MB&lt;/li&gt;
&lt;li&gt;Latency: 78ms&lt;/li&gt;
&lt;li&gt;Cost: $5/user/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After TurboQuant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory: 30MB&lt;/li&gt;
&lt;li&gt;Latency: 9ms&lt;/li&gt;
&lt;li&gt;Cost: $0.60/user/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;87% cost reduction.&lt;/p&gt;

&lt;p&gt;How It Works&lt;/p&gt;

&lt;p&gt;Standard quantization rounds randomly → error concentrates → quality loss.&lt;/p&gt;

&lt;p&gt;TurboQuant uses orthogonal transforms → error spreads → zero loss.&lt;/p&gt;

&lt;p&gt;That's the math that matters.&lt;/p&gt;

&lt;p&gt;Installation&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
bash
pip install turboquant-moe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
    </item>
  </channel>
</rss>
