Skip to main content
Associate III
September 22, 2025
Question

stm32mp2 used yolo11 module.

  • September 22, 2025
  • 1 reply
  • 421 views

Now, i used yolo11 module ,

cd /opt/ST/STEdgeAI/2.2/Utilities/linux/

./stedgeai generate -m /home/alientek/STM32MPU_workspace/yolo11.onnx  --target stm32mp25

and get yolo11.nb file 

in the my stm32mp2 broad.

x-linux-ai-benchmark -m ./yolo11n.nb

╔════════════════════════════════════════════════╗
║ X-LINUX-AI unified NN model benchmark ║
╠════════════════════════════════╦═══════════════╣
║ Machine ║ STM32MP257 ║
║ CPU cores ║ 2 ║
║ CPU Clock frequency ║ 1.5GHz ║
║ GPU/NPU Driver Version ║ 6.4.19 ║
║ GPU/NPU Clock frequency ║ 800 MHZ ║
║ X-LINUX-AI Version ║ v6.0.0 ║
║ ║ ║
║ ║ ║
╚════════════════════════════════╩═══════════════╝
For hardware accelerated models, computation engine used for benchmark is NPU running at 800 MHZ
For other models, computation engine uses for benchmark is CPU with 2 cores at : 1.5GHz
╔══════════════════════════════════════════════════════════════════════════╗
║ NBG models benchmark ║
╠════════════╦═════════════════════╦═══════╦═══════╦═══════╦═══════════════╣
║ Model Name ║ Inference Time (ms) ║ CPU % ║ GPU % ║ NPU % ║ Peak RAM (MB) ║
╠════════════╬═════════════════════╬═══════╬═══════╬═══════╬═══════════════╣
║ yolo11n ║ 1043.37 ║ 0.0 ║ 96.23 ║ 3.77 ║ 30.02 ║
╚════════════╩═════════════════════╩═══════╩═══════╩═══════╩═══════════════╝
╔══════════════════════════════════════════════════════════════╗
║ Non-Optimal models ║
╠════════════╦═════════════════════════════════════════════════╣
║ model name ║ comments ║
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝

the Inference Time is 1043.37 ms.

 

 

1 reply

Julian E.
Technical Moderator
September 22, 2025

Hello @fanronghua0123456,

 

What is the issue here?

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Associate III
September 22, 2025

the Inference Time is 1043.37 ms.  The reasoning time is too long.

Julian E.
Technical Moderator
September 30, 2025

Hello @fanronghua0123456,

 

This part of the message seems to indicate that your model is not quantized in per channel (maybe not quantized at all):
╠════════════╬═════════════════════════════════════════════════╣
║ yolo11n ║ GPU usage is 96.23% compared to NPU usage 3.77% ║
║ ║ please verify if the model is quantized or that ║
║ ║ the quantization scheme used is the 8-bits per- ║
║ ║ tensor ║
╚════════════╩═════════════════════════════════════════════════╝

The NPU is around 10x faster than the GPU and here your model is running mainly on the GPU.

The NPU can only be used with per tensor uint8 quantized model. I don't think it is the case here.

Here are some elements (first figure):

How to deploy your NN model on STM32MPU - stm32mpu

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.