Skip to main content
Associate III
June 18, 2025
Solved

Documentations about how much NPU can accelerate one operator

  • June 18, 2025
  • 1 reply
  • 370 views

Hello,

I'm interested in understanding how much an NPU can accelerate a model. I reviewed the documentation for Neural Art and found that 38 TFLite operators are supported by the hardware:

ABS, ADD, AVERAGE_POOL_2D, BATCH_MATMUL, CAST, CEIL, CONCATENATION, CONV_2D, DEPTHWISE_CONV_2D, EQUAL, EXPAND_DIMS, FULLY_CONNECTED, HARD_SWISH, LEAKY_RELU, LOGICAL_AND, LOGICAL_NOT, LOGICAL_OR, LOGISTIC, MAX_POOL_2D, MUL, PACK, PAD, PRELU, (RE)QUANTIZE, RELU, RELU6, RESHAPE, RESIZE_NEAREST_NEIGHBOR (with coordinate_transformation_mode=asymmetric and nearest_mode=floor), SPACE_TO_DEPTH (with same input/output quantization), SPLIT, SPLIT_V, STRIDED_SLICE, SQUEEZE, SUB, TANH, TRANSPOSE, TRANSPOSE_CONV, UNPACK.

However, each operator has a different level of computational complexity. Therefore, the acceleration ratio provided by the NPU may vary depending on which operators are used. For example, a model that heavily uses CONV layers may benefit more from NPU acceleration than one that primarily uses ADD operations.

Do you have any documentation or benchmarks that show the computational intensity or NPU acceleration ratio for each of these supported operators, compared to running them on the CPU?

Best answer by Julian E.

Hello @Einstein_rookie_version,

 

We don't have such benchmark as a layer is not mapped one for one on an epoch.

The only benchmarks I can redirect you are the benchmarks of models from model zoo, for example:

stm32ai-modelzoo/image_classification/efficientnet/README.md at main · STMicroelectronics/stm32ai-modelzoo · GitHub

 

In the tables you will find inference times for multiple models and boards. There is no direct comparison between n6 with and without the npu being used.

 

You also have the validate on target command with the st edge ai core to do your own benchmark:

 https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_getting_started.html 

 

Have a good day,

Julian

1 reply

Julian E.
Julian E.Best answer
Technical Moderator
June 25, 2025

Hello @Einstein_rookie_version,

 

We don't have such benchmark as a layer is not mapped one for one on an epoch.

The only benchmarks I can redirect you are the benchmarks of models from model zoo, for example:

stm32ai-modelzoo/image_classification/efficientnet/README.md at main · STMicroelectronics/stm32ai-modelzoo · GitHub

 

In the tables you will find inference times for multiple models and boards. There is no direct comparison between n6 with and without the npu being used.

 

You also have the validate on target command with the st edge ai core to do your own benchmark:

 https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_getting_started.html 

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.