Question about RSQRT implementation on STM32N6 Neural-ART NPU

Forum|Forum|2 months ago
February 18, 2026
1 reply
218 views

Hello ST Support Team,
I am using STM32N6 with Neural-ART NPU (ST Edge AI 3.0) and comparing inference outputs between:

1. TFLite on PC, and
2. the converted model running on STM32N6 NPU.

Could you please clarify how `rsqrt` is implemented on STM32N6 NPU?

I would like to know:

1. Is `rsqrt` a native HW operator, or is it decomposed (for example into `sqrt + reciprocal`)?
2. What approximation method is used internally (LUT / polynomial / Newton-Raphson / other)?
3. What numeric format is used in the HW path (fixed-point details, precision)?
4. What rounding and saturation rules are applied?
5. Are there documented error bounds or expected max/mean error vs float reference?
6. Is there any compiler/runtime option to select a more accurate vs faster mode for this operation?

My model uses INT8 I/O and the same input tensor on both platforms.

Thank you.

Julian E.

Technical Moderator

HI @retertert,

RSQRT is not supported on NPU:

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

It seems to be supported only for TFLITE models (not onnx):

https://stedgeai-dc.st.com/assets/embedded-docs/supported_ops_tflite.html#rsqrt

I asked for more details on its implementation

regarding your two last points, we don't provide such details for layers and there is no way to custom the behavior of a particular layer. For target without NPU, there is the possibility to "optimize" the model in such way, but the whole model, not just some layers.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded