The Impossible Problem
GPT-2 Small: 124M parameters = ~500MB
Arduino Uno: 2KB RAM, 32KB Flash
Gap: ~250,000x
The Solution
I built BitForge - aggressive LLM quantization for microcontrollers.
What It Does
- 1-bit to 8-bit quantization
- Adaptive per-layer bit width
- Pure C99 output
- No dependencies
Results
- 8x compression achieved
- 99.3% correlation preserved
- Tested on ESP32, Arduino, STM32 targets
Try It
pip install bitforge
bitforge compress gpt2 --target esp32-s3 --bits 4
Top comments (0)