<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aman Sachan</title>
    <description>The latest articles on DEV Community by Aman Sachan (@aman_sachan_126d19c4a2773).</description>
    <link>https://dev.to/aman_sachan_126d19c4a2773</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3905077%2Fb9a51a6d-6ccb-4265-afe4-af43e57b0e81.jpg</url>
      <title>DEV Community: Aman Sachan</title>
      <link>https://dev.to/aman_sachan_126d19c4a2773</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aman_sachan_126d19c4a2773"/>
    <language>en</language>
    <item>
      <title>BTCRouter: Real-Time Bitcoin Fee Estimation Without a Full Node</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:30:42 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/btcrouter-real-time-bitcoin-fee-estimation-without-a-full-node-2a0h</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/btcrouter-real-time-bitcoin-fee-estimation-without-a-full-node-2a0h</guid>
      <description>&lt;p&gt;I built &lt;strong&gt;BTCRouter&lt;/strong&gt; because most Bitcoin wallets estimate fees with a simple multiplier — "slow / medium / fast" — and it's often catastrophically wrong. When the mempool is empty, you overpay. When it's full, you underpay and wait 3 days.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Current Fee Estimation
&lt;/h2&gt;

&lt;p&gt;Full Bitcoin Core nodes solve this properly, but require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;150GB+ storage&lt;/li&gt;
&lt;li&gt;All-day initial sync&lt;/li&gt;
&lt;li&gt;Constant upkeep&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Too heavy for embedded devices, mobile apps, or quick scripting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BTCRouter&lt;/strong&gt; uses Blockstream's Electrum API instead — no node required, works from any machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  What BTCRouter Does
&lt;/h2&gt;

&lt;p&gt;A Python library for real-time Bitcoin intelligence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fee Estimation&lt;/strong&gt; — 4 tiers (economy/normal/fast/instant) from live mempool data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Congestion Analysis&lt;/strong&gt; — score 0–100 of on-chain demand with actionable recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UTXO Optimization&lt;/strong&gt; — greedy selection minimizing inputs + fees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RBF Simulation&lt;/strong&gt; — model Replace-By-Fee scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy Scoring&lt;/strong&gt; — grade your UTXO set based on address reuse and amount patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How Fee Estimation Works
&lt;/h2&gt;

&lt;p&gt;Blockstream returns percentile fee data at different block targets:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Block Target&lt;/th&gt;
&lt;th&gt;Typical Wait&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Economy&lt;/td&gt;
&lt;td&gt;24 blocks (~4 hrs)&lt;/td&gt;
&lt;td&gt;~30 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Normal&lt;/td&gt;
&lt;td&gt;6 blocks (~1 hr)&lt;/td&gt;
&lt;td&gt;~1 hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;3 blocks (~30 min)&lt;/td&gt;
&lt;td&gt;~10 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;td&gt;1 block&lt;/td&gt;
&lt;td&gt;next block&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  UTXO Selection Algorithm
&lt;/h2&gt;

&lt;p&gt;Uses &lt;strong&gt;greedy selection by value&lt;/strong&gt; — sort UTXOs descending, pick largest until total covers target + fee. Minimizes input count → smaller transaction → lower fees.&lt;/p&gt;

&lt;h2&gt;
  
  
  Privacy Scoring
&lt;/h2&gt;

&lt;p&gt;Your UTXO set leaks privacy in subtle ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Address reuse&lt;/strong&gt; — deduct 10pts per duplicate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Round amounts&lt;/strong&gt; — many UTXOs at exact multiples of 100k sats (likely exchange batches)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One dominant UTXO&lt;/strong&gt; — &amp;gt;90% of value in single UTXO (easy to correlate)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;\&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Use BTCRouter
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Embedded Bitcoin projects (Raspberry Pi, microcontrollers)&lt;/li&gt;
&lt;li&gt;Mobile wallets that can't run a full node&lt;/li&gt;
&lt;li&gt;Trading bots needing accurate fee estimation for batching&lt;/li&gt;
&lt;li&gt;Lightning node operators managing commitment transaction fees&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Single File, Zero Dependencies
&lt;/h2&gt;

&lt;p&gt;Drop btcrouter.py into any project. No node, no 150GB download. Just live Bitcoin intelligence.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/AmSach/btc-router" rel="noopener noreferrer"&gt;https://github.com/AmSach/btc-router&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>bitcoin</category>
      <category>cryptocurrency</category>
      <category>opensource</category>
    </item>
    <item>
      <title>QueryFS - SQL Query Your Filesystem</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:59:33 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/queryfs-sql-query-your-filesystem-43pi</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/queryfs-sql-query-your-filesystem-43pi</guid>
      <description>&lt;h2&gt;
  
  
  QueryFS
&lt;/h2&gt;

&lt;p&gt;Query your files with SQL. No database needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;

&lt;p&gt;Mount your filesystem as a queryable database. Run SQL queries against files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;SELECT * FROM /path LIMIT 10&lt;/li&gt;
&lt;li&gt;WHERE clauses: size &amp;gt; 1MB, name LIKE '%.py'&lt;/li&gt;
&lt;li&gt;Output formats: json, csv, table&lt;/li&gt;
&lt;li&gt;Zero dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GitHub
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/AmSach/queryfs" rel="noopener noreferrer"&gt;https://github.com/AmSach/queryfs&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;queryfs
queryfs query &lt;span class="s2"&gt;"SELECT * FROM ~/Documents LIMIT 10"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>python</category>
      <category>cli</category>
      <category>tools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>SoundForge - Clone Any Voice in 10 Seconds, Export to C/WASM/ESP32</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:41:44 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/soundforge-clone-any-voice-in-10-seconds-export-to-cwasmesp32-2li5</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/soundforge-clone-any-voice-in-10-seconds-export-to-cwasmesp32-2li5</guid>
      <description>&lt;h2&gt;
  
  
  SoundForge
&lt;/h2&gt;

&lt;p&gt;Voice cloning toolkit that generates portable models you own forever, not rent.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;

&lt;p&gt;Clone any voice in 10 seconds. Export to browser (WASM), ESP32, or standalone C code. No cloud API required after training.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10-second cloning&lt;/strong&gt; - minimal audio input needed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portable exports&lt;/strong&gt; - C, WASM, ONNX formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero cloud dependency&lt;/strong&gt; - inference runs locally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform&lt;/strong&gt; - browser, ESP32, Arduino&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Use
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone a voice&lt;/span&gt;
soundforge clone voice_sample.wav

&lt;span class="c"&gt;# Export to ESP32&lt;/span&gt;
soundforge &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="nt"&gt;--target&lt;/span&gt; esp32 &lt;span class="nt"&gt;--output&lt;/span&gt; voice_model.c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GitHub
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/AmSach/soundforge" rel="noopener noreferrer"&gt;https://github.com/AmSach/soundforge&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built for devs who want voice AI without API bills.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>audio</category>
      <category>embedded</category>
      <category>webassembly</category>
    </item>
    <item>
      <title>VoxelNav - Real-time 3D Semantic Mapping for ROS2 Robots</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:41:38 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/voxelnav-real-time-3d-semantic-mapping-for-ros2-robots-48o2</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/voxelnav-real-time-3d-semantic-mapping-for-ros2-robots-48o2</guid>
      <description>&lt;h2&gt;
  
  
  VoxelNav
&lt;/h2&gt;

&lt;p&gt;Real-time 3D semantic voxel mapping for ROS2 robots.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Does
&lt;/h3&gt;

&lt;p&gt;Takes LiDAR scans + camera feeds and turns them into labeled 3D voxel maps. Knows what is floor, wall, person, furniture, door - then feeds that to Nav2 for smart navigation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;O(1) voxel hashing&lt;/strong&gt; - constant-time lookup regardless of map size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MobileNetV3 segmentation&lt;/strong&gt; - AI labeling of objects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nav2 costmap plugin&lt;/strong&gt; - direct integration with ROS2 navigation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100ms latency&lt;/strong&gt; - real-time on Jetson Nano&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Use
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;voxelnav &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; colcon build

&lt;span class="c"&gt;# Run&lt;/span&gt;
ros2 run voxelnav voxelnav_node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GitHub
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/AmSach/voxelnav" rel="noopener noreferrer"&gt;https://github.com/AmSach/voxelnav&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Built for ROS2 robots that need semantic maps without expensive hardware.&lt;/p&gt;

</description>
      <category>robotics</category>
      <category>ros</category>
      <category>cpp</category>
      <category>ai</category>
    </item>
    <item>
      <title>KVQuant: Run 70B LLMs on 8GB RAM with Real-Time KV Cache Compression</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 11:35:54 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/kvquant-run-70b-llms-on-8gb-ram-with-real-time-kv-cache-compression-24p0</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/kvquant-run-70b-llms-on-8gb-ram-with-real-time-kv-cache-compression-24p0</guid>
      <description>&lt;p&gt;I built KVQuant because I wanted to run 70B parameter models on my gaming laptop. The problem? Even with 4-bit quantization, a 128K context window needs 256GB RAM just for the KV cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When you run an LLM, the memory bottleneck is not the model weights - it is the KV cache.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Weights (4-bit)&lt;/th&gt;
&lt;th&gt;KV Cache (128K ctx)&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama-3-8B&lt;/td&gt;
&lt;td&gt;5GB&lt;/td&gt;
&lt;td&gt;64GB&lt;/td&gt;
&lt;td&gt;69GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama-3-70B&lt;/td&gt;
&lt;td&gt;40GB&lt;/td&gt;
&lt;td&gt;256GB&lt;/td&gt;
&lt;td&gt;296GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;KVQuant compresses the KV cache in real-time using per-position adaptive quantization.&lt;/p&gt;

&lt;p&gt;Result: 4-6x compression with less than 1% perplexity increase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kvquant&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;KVQuant&lt;/span&gt;
&lt;span class="n"&gt;compressor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;KVQuant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_memory_gb&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compressor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/AmSach/kvquant" rel="noopener noreferrer"&gt;https://github.com/AmSach/kvquant&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>llm</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Compressed GPT-2 to Run on an Arduino ($3 Microcontroller) — Here's How</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 10:51:00 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-compressed-gpt-2-to-run-on-an-arduino-3-microcontroller-heres-how-37no</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-compressed-gpt-2-to-run-on-an-arduino-3-microcontroller-heres-how-37no</guid>
      <description>&lt;p&gt;I got GPT-2 running on a $3 Arduino. No cloud. No subscription. Just quantization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;br&gt;
Local LLMs are great until you try to run them on real hardware. GPT-2 takes 500MB+ just for the KV cache. On an embedded device? Forget it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution: KVQuant&lt;/strong&gt;&lt;br&gt;
I compressed the KV cache from full precision to 1-bit per value using per-channel symmetric quantization. Mixed INT8 for attention scores where precision matters more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3.2x faster inference&lt;/li&gt;
&lt;li&gt;73% memory reduction
&lt;/li&gt;
&lt;li&gt;Runs on ESP32-class hardware&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;kvquant&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;QuantizedModel&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QuantizedModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello world&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benchmark:&lt;/strong&gt;&lt;br&gt;
| Model | Memory | Latency |&lt;br&gt;
|-------|--------|---------|&lt;br&gt;
| FP16 GPT-2 | 520MB | 2.1s |&lt;br&gt;
| KVQuant-1b | 140MB | 0.65s |&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/AmSach/kvquant" rel="noopener noreferrer"&gt;https://github.com/AmSach/kvquant&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't a demo — it's a real quantization library with INT8 kernels and hardware-aware optimizations. Pull the repo, run the examples, see for yourself.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Compressed GPT-2 to Run on an Arduino</title>
      <dc:creator>Aman Sachan</dc:creator>
      <pubDate>Thu, 30 Apr 2026 09:41:45 +0000</pubDate>
      <link>https://dev.to/aman_sachan_126d19c4a2773/i-compressed-gpt-2-to-run-on-an-arduino-3a3f</link>
      <guid>https://dev.to/aman_sachan_126d19c4a2773/i-compressed-gpt-2-to-run-on-an-arduino-3a3f</guid>
      <description>&lt;h2&gt;
  
  
  The Impossible Problem
&lt;/h2&gt;

&lt;p&gt;GPT-2 Small: 124M parameters = ~500MB&lt;/p&gt;

&lt;p&gt;Arduino Uno: 2KB RAM, 32KB Flash&lt;/p&gt;

&lt;p&gt;Gap: ~250,000x&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;I built BitForge - aggressive LLM quantization for microcontrollers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;1-bit to 8-bit quantization&lt;/li&gt;
&lt;li&gt;Adaptive per-layer bit width&lt;/li&gt;
&lt;li&gt;Pure C99 output&lt;/li&gt;
&lt;li&gt;No dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;8x compression achieved&lt;/li&gt;
&lt;li&gt;99.3% correlation preserved&lt;/li&gt;
&lt;li&gt;Tested on ESP32, Arduino, STM32 targets&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;bitforge
bitforge compress gpt2 &lt;span class="nt"&gt;--target&lt;/span&gt; esp32-s3 &lt;span class="nt"&gt;--bits&lt;/span&gt; 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/AmSach/bitforge" rel="noopener noreferrer"&gt;https://github.com/AmSach/bitforge&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>embedded</category>
      <category>tinyml</category>
      <category>python</category>
    </item>
  </channel>
</rss>
