Optimizing DFT for loop

Question

Do you think that it is common for the function bellow (arraySize = 32) to take approximately 125us to execute? The for loop is the main problem as it takes almost 120us to execute.I am using STM32G474ret6u MCU, with clock at 170Mhz? Also the optimization is set to -03.Does anyone have any idea in which direction should I go to optimize it? Should I examine the assembler code, or try to use some peripheral like FMAC?const float reComponent[32] = {1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880,
		-1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880,
		0.707107, 0.382683, 0.000000, -0.382683,-0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683,
		-0.000000, 0.382683, 0.707107, 0.923880 };

const float imComponent[32] = {0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000,
		-0.382683, -0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107,
		0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880, -1.000000,
		-0.923880, -0.707107, -0.382683 };

float DFTphase(uint16_t* inputArray, int arraySize)
{

//local variables
 float fkRe=0;
 float fkIm=0;
 float phase=0;

//Computing of Fourier series
 for (int n = 0; n < arraySize; n++)
 {
 fkRe = fkRe + (*inputArray - 2048.0) * reComponent[n];
 fkIm = fkIm + (*inputArray - 2048.0) * imComponent[n];
 //Assign address of next element to pointer inputArray
 inputArray++;
 }

//Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
 phase= atan2f(fkRe,fkIm);

return phase;
}

AScha.3 · Accepted Answer

Your input is uint16, so subtracting 2048 (as integer) would be much faster , at same precision as your 2048.0 (as double float); try...and tell , how much faster it is.

+

Cordic can do the atan in about 140ns (i tried on H563 at 250MHz) - if this helps.

Pavel A. · Answer

Also you can put the const arrays in RAM: fetching from RAM may be faster than flash.

Suggest to massage the code a bit so it doesn't scratch the reviewer's eye...

float DFTphase(uint16_t* inputArray, int arraySize)
{
 assert(arraySize <= 32);
 float fkRe=0;
 float fkIm=0;

 for (int n = 0; n < arraySize; n++)
 {
 float v = (float)(inputArray[n] - 2048U);
 fkRe += v * reComponent[n];
 fkIm += v * imComponent[n];
 }

 //Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
 return atan2f(fkRe,fkIm);
}

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded