Does compressing the model speeds up the inference (prediction)?
Hi
I imported simple CNN to STM32L462RCT using STM32CUBE-AI v5.1.2 ApplicationTemplate
I found that compressing the model has no effect on inference time.
The aiRun procedure runs for 115ms both in 8-bit compression and "none" configurations although the accuracy drops a bit.
I thought compressing float network parameters to uint8_t would not only save the memory but also speed up the inference.
So, is compressing the model supposed to speed up the inference?
