Deploying YOLOv8 on RK3566 Using RKNN Toolkit: Notes, Pitfalls, and Benchmarks

#yolov8 #edgeai #rk3566 #embedded

Running YOLOv8 on RK3566 is a practical choice for edge AI devices where cost, thermal stability, and NPU acceleration matter.

This post summarizes the key technical steps, conversion notes, and pitfalls we encountered while deploying YOLOv8 on an RK3566 board — without repeating the full tutorial.

(Full detailed tutorial is linked at the end.)

1. Model Preparation

YOLOv8 exports cleanly into ONNX, but RKNN Toolkit requires strict operator compatibility.

Recommended export:

yolo export model=yolov8n.pt format=onnx opset=12 dynamic=False

Why opset=12?
RKNN Toolkit (especially on RK3566) has the best stability with opset 11–13. Higher versions may break resize/activation layers.

2. Convert ONNX → RKNN

Using RKNN-Toolkit2:

from rknn.api import RKNN

rknn = RKNN()

rknn.config(
    mean_values=[[0, 0, 0]],
    std_values=[[255, 255, 255]],
    quantized_dtype='asymmetric_quantized-u8'
)

rknn.load_onnx('yolov8n.onnx')

rknn.build(
    do_quantization=True,
    dataset='./dataset.txt'
)

3. Inference on RK3566

Minimal runtime example:

import cv2
from rknn.api import RKNN

rknn = RKNN()
rknn.load_rknn('yolov8n.rknn')
rknn.init_runtime()

img = cv2.imread('test.jpg')
img_resized = cv2.resize(img, (640, 640))

outputs = rknn.inference(inputs=[img_resized])

Post-processing (NMS, decoding) runs on the ARM CPU.
Optimizing this step often gives the biggest FPS improvement.

4. Performance Notes (Real Tests)

Approximate results on RK3566 NPU:

| Model   | Precision | FPS       |
| ------- | --------- | --------- |
| YOLOv8n | INT8      | 16–22 FPS |
| YOLOv8n | FP32      | 4–6 FPS   |
| YOLOv8s | INT8      | 8–12 FPS  |

Quantization accuracy is highly dependent on dataset representativeness — especially for small-object detection.

5. Common Pitfalls

✔ 1. Quantization dataset too small
INT8 accuracy drops significantly without enough real-scene samples.

✔ 2. Resize mismatch
Letterboxing vs raw resize affects detection stability.

✔ 3. Preprocessing not aligned
Mismatch between training normalization and RKNN config = confidence drift.

✔ 4. CPU-side post-processing bottleneck
Consider:

vectorized NumPy
C++ post-processing
offloading compatible steps to RKNN ops

Full Tutorial With Code + Benchmarks

📌 Full step-by-step guide:
👉 https://zediot.com/blog/how-to-deploy-yolov8-on-rk3566/

Need Help?

Working on RK3566/RK3588 deployments or YOLO/TensorRT optimization on edge hardware?

We help teams with quantization, model conversion, NPU optimization, and embedded integration. If you're running into conversion errors or performance drops, feel free to reach out — happy to help. Contact us here: https://zediot.com/contact/