To take a neural network you trained in Python and run it on a microcontroller, the winning pattern is:
Train in Python → shrink (INT8 quantize) → convert/export → compile into firmware → run inference with a tiny runtime (TFLite Micro / vendor runtime) → validate speed + RAM/Flash.
Below are the most common routes and a concrete end-to-end example.
Route choices (pick one)
A) TensorFlow → TensorFlow Lite (TFLite) → TFLite Micro
Most common “generic MCU” workflow:
- Convert with the TFLite Converter.
- Quantize (usually INT8) using post-training quantization or QAT.
- Run on-device with TFLite Micro (a small C++ runtime).
B) STM32 specifically → STM32Cube.AI / X-CUBE-AI
If your target is STM32, this is often the fastest “it just works” path:
Import model from popular frameworks, optimize, and generate STM32 project code via CubeMX/Cube.AI.
C) Hand-optimized kernels on Cortex-M → CMSIS-NN
For maximum speed/small footprint (but more manual work):
Use CMSIS-NN’s optimized NN kernels for Cortex-M.
D) Compile with TVM → microTVM
If you want compiler-driven optimization and AOT builds:
TVM’s microTVM can compile TFLite models to embedded targets.
The “standard” pipeline (works on most MCUs)
1) Design for MCU constraints (before training)
- Prefer small architectures: DS-CNN / tiny CNN / small MLP
- Avoid heavy ops not supported on micro runtimes
- Know your budget:
- Flash holds model + code
- RAM holds tensors (“arena”), stacks, buffers
2) Quantize to INT8 (huge enabler)
INT8 typically cuts model size ~4× and speeds up inference on MCUs (especially Cortex-M with optimized kernels). Quantization options are documented by Google.
3) Convert to .tflite
Use the TFLite Converter (recommended starting point in official docs).
4) Turn .tflite into a C array and compile into firmware
Common approach: embed the model as const unsigned char model[] = {...};
5) Run inference on MCU (TFLite Micro idea)
- Create an interpreter
- Provide a tensor arena (static RAM buffer)
- Feed input tensor → Invoke() → read output tensor
- Measure latency and tune arena size
Minimal end-to-end example (TensorFlow → TFLite INT8 → MCU)
Python: convert + INT8 quantize (skeleton)
import tensorflow as tf
# 1) Load your SavedModel (or Keras model)
converter = tf.lite.TFLiteConverter.from_saved_model("saved_model_dir")
# 2) INT8 post-training quantization (representative dataset required for full int8)
def rep_data_gen():
for _ in range(200):
# yield a batch shaped like your model input
yield [tf.random.uniform([1, 128], minval=-1, maxval=1, dtype=tf.float32)]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = rep_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()
with open("model_int8.tflite", "wb") as f:
f.write(tflite_model)
(Converter + post-training quantization are covered in Google’s docs.
C/C++ (MCU): TFLite Micro inference flow (concept)
// Pseudocode-level (exact headers depend on your TFLM integration)
#include "model_data.h" // const unsigned char g_model[]
#include "tflite_micro_runtime.h"
static uint8_t tensor_arena[30 * 1024]; // tune this
int main(void) {
// init clocks, UART, etc.
MicroInterpreter interp(g_model, tensor_arena, sizeof(tensor_arena));
interp.AllocateTensors();
int8_t* in = interp.input_int8(0);
// Fill input (already int8 quantized domain)
// in[i] = ...
interp.Invoke();
int8_t* out = interp.output_int8(0);
// Use out...
}
Practical tips that save days
- Start with a known tiny example (keyword spotting, gesture, simple classifier) to validate your toolchain, then swap in your model.
- Arena sizing: if AllocateTensors() fails, increase tensor_arena; if it’s huge, simplify the model.
- Operator support: if conversion succeeds but inference fails, it’s often an unsupported op (or requires a different quantization strategy).
- On STM32, seriously consider STM32Cube.AI for a smoother workflow + codegen.
- On Cortex-M, enabling CMSIS-NN kernels can give big speedups.

Top comments (0)