Stedgeai tool fails generating outputs for STM32N6's Neural-ART accelerated NPU
Hello,
I am trying to run a Transformer model with Multi-head self attention on the STM32N6570-DK board.
The model is this one : https://github.com/arx7ti/cold-nilm
Here is what I did so far:
- Convert the .ckpt I got from the above repo into a .onnx format.
- Import and Quantize the model using your ST Edge AI Developer Cloud tool.
- Try to use stedgeai `generate` command:
- From the standalone tool on my Linux machine
- From STCubeMX pluggin
- From the ST Edge AI Developer Cloud
--> None of thoses attempts were succesful. Here is the command I used in the first case:
stedgeai analyze --model cold_stm32_float32_final_ready_PerChannel_quant_random_1.onnx --st-neural-art default@user_neuralart_STM32N6570-DK.json --target stm32n6 --name network --workspace workspace --output outputI got several issues that I tried to fix in order to make the Developer Cloud tool quantize my initial onnx model:
- Removing Constant operations
- Converting LayerNormalization layers in compatible layers
- Removing Pow operations
- Removing Reshape allowzero attributes
- Removing ReduceMean noop_with_epty_axes attributes
- ...
I cannot guarantee that everything was necessary. My understanding of what I was really doing is not so important to be honnest. I just tried to get rid of all error messages one by one.
But I did not succeed in making it work. I always got this message (as in this post from
ST Edge AI Core v2.2.0-20266 2adc00962
INTERNAL ERROR: Exported ONNX could be malformed since ONNX shape inference failsNow, I am pretty confident that the issue is related to the model type (Transformer-based) not compatible with stedgeai tool. Moreover I have read here:
"It is specifically designed to accelerate the inference execution of a wide range of quantized convolutional neural network (CNN) models in area-constrained embedded and IoT devices."
Here is my question: is there absolutely no way to run such model on the NPU ?
PS: I have added the orignal .ckpt model, the generated onnx and the quantized one in an attached archive. Do not hesitate to have a look at them !
