Optimizing DFT for loop
Do you think that it is common for the function bellow (arraySize = 32) to take approximately 125us to execute? The for loop is the main problem as it takes almost 120us to execute.
I am using STM32G474ret6u MCU, with clock at 170Mhz? Also the optimization is set to -03.
Does anyone have any idea in which direction should I go to optimize it? Should I examine the assembler code, or try to use some peripheral like FMAC?
const float reComponent[32] = {1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880,
-1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880,
0.707107, 0.382683, 0.000000, -0.382683,-0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683,
-0.000000, 0.382683, 0.707107, 0.923880 };
const float imComponent[32] = {0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000,
-0.382683, -0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107,
0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880, -1.000000,
-0.923880, -0.707107, -0.382683 };
float DFTphase(uint16_t* inputArray, int arraySize)
{
//local variables
float fkRe=0;
float fkIm=0;
float phase=0;
//Computing of Fourier series
for (int n = 0; n < arraySize; n++)
{
fkRe = fkRe + (*inputArray - 2048.0) * reComponent[n];
fkIm = fkIm + (*inputArray - 2048.0) * imComponent[n];
//Assign address of next element to pointer inputArray
inputArray++;
}
//Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
phase= atan2f(fkRe,fkIm);
return phase;
}
