Skip to main content
Graduate
December 6, 2023
Question

STM32U5 SIMD usage (or any other trick to speedup math)

  • December 6, 2023
  • 1 reply
  • 1534 views

Hello,
In my application i need to run double for loop for data inside SRAM. and I would love to speed up my program as much as possible.

uint32_t xs = 0, yx = 0, sum =0, i = 0;

for(uint32_t x = 0 ; x<600 ; x++)
{
    for(uint32_t y = 0 ; y<600 ; y++)
    {
        xs+=DATA[i]*x;
        ys+=DATA[i]*y;
        sum+=DATA[i];
        i++;
    }
}

Any insight into how I can make this faster ? Optimization is already at highest setting, and I am getting 33Hz loop speed with data acquisition.
Last resort is going into inline ASM somehow, just don't understand how do I know if compiler is using some particular CPU registers, that could hold registers longer.

 

 

    This topic has been closed for replies.

    1 reply

    Super User
    December 6, 2023

    You can also get a sense for how optimized a loop is by looking at the generated assembly code. Use that to guide how the code is written.

    You can use a pointer to DATA instead of an index. Might save a little.

    const uint32_t* ptr = &DATA[0];

    ...

    xs += *ptr * x;

    ...

    ++ptr;

     

    Visitor II
    December 6, 2023

    And try also :

    • Manual Loop unrolling (see for example or use if possible CMSIS DSP)
    • Locate DATA in a SRAM and temporary var in DTCMRAM
    • Give a look to FMAC/DMA of STM32U5 if DATA is not 32bits.