STM32U5 SIMD usage (or any other trick to speedup math)
Hello,
In my application i need to run double for loop for data inside SRAM. and I would love to speed up my program as much as possible.
uint32_t xs = 0, yx = 0, sum =0, i = 0;
for(uint32_t x = 0 ; x<600 ; x++)
{
for(uint32_t y = 0 ; y<600 ; y++)
{
xs+=DATA[i]*x;
ys+=DATA[i]*y;
sum+=DATA[i];
i++;
}
}
Any insight into how I can make this faster ? Optimization is already at highest setting, and I am getting 33Hz loop speed with data acquisition.
Last resort is going into inline ASM somehow, just don't understand how do I know if compiler is using some particular CPU registers, that could hold registers longer.
