Skip to main content
Explorer
September 24, 2024
Question

Time Delay in FFT performance

  • September 24, 2024
  • 4 replies
  • 4397 views

Hello community,
The CMSIS DSP library supports a 4096-point FFT, but my project requires a 16384-point FFT. I added some extra library files (downloaded from GitHub; these files are written in C) to my project and successfully
performed a 16384-point FFT. However, the issue is that performing the FFT takes too much time, with a delay of around 27 to 30 seconds. Does anyone have a solution to make this more efficient and reduce the time?

 

    This topic has been closed for replies.

    4 replies

    Super User
    September 24, 2024

    Hi,

    16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s.

    What we talk about : data/FFT in place? only real data ? fixed 16b or float or double data/FFT ?

     

    To get best speed, you have to use (for ARM ) optimized FFT, CMSIS DSP library  should be this exactly.

    Then your data : in RAM, all caches ON. (I+D)

    And code with optimizer on -O2 or -Ofast .

    SA V.1Author
    Explorer
    September 24, 2024

    16k FFT , is about N*log(N) butterfly ops -> about 230k ; assuming the cpu can do 4Mio /s , your FFT should run in about 100ms or so, not 30 s.  ---> i didn't understand this calculation ?  

    FFT input is float, am using ARM(STM32H745xx muc CortexM7),CMSIS DSP library limit is 4096 points my project requrement 16384 points .

     

    Super User
    September 24, 2024

    @SA V.1 wrote:

    FFT input is float,


    Beware that standard C libraries often use double - does the H7's floating point unit support double ?

    If not, the calculations will be done in software...

     


    @SA V.1 wrote:

     

    I added some extra library files (downloaded from GitHub...


    So give a link.

    Are those libraries intended for embedded microcontrollers?

    SA V.1Author
    Explorer
    September 24, 2024

    H7's floating point unit support double ?

    Yes  --->The Arm® Cortex®-M7 with double-precision FPU processor is the latest generation of Arm processors for embedded systems.

    I add these library files and peforming the FFT and got the result also but the problen is taking too much time for this any solution ? 

     link--->https://github.com/Treeed/Long_FFTs_for_CMSIS_DSP/tree/master

    Super User
    September 24, 2024
    SA V.1Author
    Explorer
    October 3, 2024

    In the CMSIS DSP library, a 4K FFT takes 1.6 seconds to execute. When I enable the I+D cache, it takes 440 ms. However, when I add extra DSP library files to perform a 16,384-point FFT, it takes 27 seconds to execute. With the I+D cache enabled, it takes 7 seconds. Is there Any Solution ???

    Super User
    October 3, 2024

    So, with cache:

    •  4,096 points takes 440 ms;
    • 16,384 points takes 7s

    without cache

    •  4,096 points takes 1.6s;
    • 16,384 points takes 27s

    In both cases, the difference is a factor of 16:

    16K points is 4 times as many as 4K points;

    4 squared is 16.

    Is it a surprise that multiplying the number of points by X multiplies the execution time by X squared ?