I got GPT-2 running on an Arduino! Here's the quantization pipeline.
Process:
- Q4_K_M quantization via llama.cpp
- Memory-mapped flash for weight storage
- Optimized matvec for ARM Cortex-M
- KV cache quantization
Results:
- Arduino Nano 33 BLE: 3 tokens/sec
- ESP32-S3: 15 tokens/sec
- Raspberry Pi Pico: 8 tokens/sec
Code: github.com/AmSach/bitforge
Hardware requirements: 512KB RAM, 2MB flash.
Top comments (0)