DEV Community

Aman Sachan
Aman Sachan

Posted on

BitForge: Run LLMs on Microcontrollers

I got GPT-2 running on an Arduino! Here's the quantization pipeline.

Process:

  1. Q4_K_M quantization via llama.cpp
  2. Memory-mapped flash for weight storage
  3. Optimized matvec for ARM Cortex-M
  4. KV cache quantization

Results:

  • Arduino Nano 33 BLE: 3 tokens/sec
  • ESP32-S3: 15 tokens/sec
  • Raspberry Pi Pico: 8 tokens/sec

Code: github.com/AmSach/bitforge

Hardware requirements: 512KB RAM, 2MB flash.

Top comments (0)