Skip to main content
Graduate
December 29, 2023
Solved

Optimizing DFT for loop

  • December 29, 2023
  • 2 replies
  • 2098 views

Do you think that it is common for the function bellow (arraySize = 32) to take approximately 125us to execute? The for loop is the main problem as it takes almost 120us to execute.
I am using STM32G474ret6u MCU, with clock at 170Mhz? Also the optimization is set to -03.
Does anyone have any idea in which direction should I go to optimize it? Should I examine the assembler code, or try to use some peripheral like FMAC?

const float reComponent[32] = {1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880,
		-1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880,
		0.707107, 0.382683, 0.000000, -0.382683,-0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683,
		-0.000000, 0.382683, 0.707107, 0.923880 };

const float imComponent[32] = {0.000000, 0.382683, 0.707107, 0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000,
		-0.382683, -0.707107, -0.923880, -1.000000, -0.923880, -0.707107, -0.382683, -0.000000, 0.382683, 0.707107,
		0.923880, 1.000000, 0.923880, 0.707107, 0.382683, 0.000000, -0.382683, -0.707107, -0.923880, -1.000000,
		-0.923880, -0.707107, -0.382683 };

float DFTphase(uint16_t* inputArray, int arraySize)
{

 //local variables
 float fkRe=0;
 float fkIm=0;
 float phase=0;

 //Computing of Fourier series
 for (int n = 0; n < arraySize; n++)
 {
 fkRe = fkRe + (*inputArray - 2048.0) * reComponent[n];
 fkIm = fkIm + (*inputArray - 2048.0) * imComponent[n];
 //Assign address of next element to pointer inputArray
 inputArray++;
 }

 //Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
 phase= atan2f(fkRe,fkIm);

 return phase;
}

 

    This topic has been closed for replies.
    Best answer by AScha.3

    Your input is uint16, so subtracting 2048 (as integer) would be much faster , at same precision as your 2048.0 (as double float); try...and tell , how much faster it is.

    +

    Cordic can do the atan in about 140ns (i tried on H563 at 250MHz) - if this helps.

    2 replies

    AScha.3Answer
    Super User
    December 29, 2023

    Your input is uint16, so subtracting 2048 (as integer) would be much faster , at same precision as your 2048.0 (as double float); try...and tell , how much faster it is.

    +

    Cordic can do the atan in about 140ns (i tried on H563 at 250MHz) - if this helps.

    GHrib.1Author
    Graduate
    December 29, 2023

    Crazy :D . The whole function execution time is now approximately 8us (before it was 126us), I am more than pleased with that :D. I double-checked because I couldn't believe it.
    Thank you.

    Super User
    December 29, 2023

    Also you can put the const arrays in RAM: fetching from RAM may be faster than flash.

    Suggest to massage the code a bit so it doesn't scratch the reviewer's eye...

    float DFTphase(uint16_t* inputArray, int arraySize)
    {
     assert(arraySize <= 32);
     float fkRe=0;
     float fkIm=0;
    
     for (int n = 0; n < arraySize; n++)
     {
     float v = (float)(inputArray[n] - 2048U);
     fkRe += v * reComponent[n];
     fkIm += v * imComponent[n];
     }
    
     //Evaluation of phase; atan2f function returns angle in the interval [-PI,PI]
     return atan2f(fkRe,fkIm);
    }

     

    Super User
    December 29, 2023

    massage

    :face_with_tears_of_joy: