STM32L432KC basic DSP loop. Any way to improve this?
I am using Nucleo STM32 L432KC board with STM32CubeIDE v1.12.1 and doing very basic DSP operations with data values read via ADC through DMA into a circular buffer. It works, but the math is slower than expected.
Below is an example loop in my code. All variables are declared type int except for adc_buffer[] which is uint16_t.
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_3, GPIO_PIN_SET); // GPIO signal flag
for (int i = idxStart; i < idxEnd; i++) {
x = adc_buffer[i];
sum += x;
}
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_3, GPIO_PIN_RESET); // GPIO signal flag
// high for 1.05 msec: 1 loop time = 262.5 ns = 21 clocks @ 80 MHzThe for loop in this case is running through 4000 cycles (half of my 8000 element buffer). Based on the GPIO 3 output pulse on my scope, 4000 cycles takes 1.05 ms, therefore one loop cycle is 262.5 nsec, and here the CPU clock is f=80 MHz (t=12.5 ns) so one loop takes 21 clocks. Looks to me like I'm doing only four operations: an increment and conditional branch for the loop, fetching a 16-bit number, and adding it to a 32-bit sum. Is there any way to do that in less than 21 clocks, or is this the best it can do? I did try removing the explicit intermediate variable x and writing sum += adc_buffer[i]; but the timing did not change.
This device has a 5 MHz ADC. Eventually I want to count analog input peaks above some threshold, and track their amplitude, but it doesn't look like I can do very much math per sample in real time.
