Skip to content

DEV Community

Aman Sachan

Posted on Apr 30

BitForge: Run LLMs on Microcontrollers

#llm #esp32 #iot #python

I got GPT-2 running on an Arduino! Here's the quantization pipeline.

Process:

Q4_K_M quantization via llama.cpp
Memory-mapped flash for weight storage
Optimized matvec for ARM Cortex-M
KV cache quantization

Results:

Arduino Nano 33 BLE: 3 tokens/sec
ESP32-S3: 15 tokens/sec
Raspberry Pi Pico: 8 tokens/sec

Code: github.com/AmSach/bitforge

Hardware requirements: 512KB RAM, 2MB flash.

Top comments (0)

Subscribe