Skip to main content
Visitor II
November 10, 2023
Solved

FatFs unrealiable under high load on STM32H743

  • November 10, 2023
  • 8 replies
  • 6917 views

Hello,

I am using a STM32H743 with a W25Q128 external flash memory running FreeRTOS and FatFs. USB is configured for mass storage, to copy files from a computer to the flash memory. The following function is used to list all the files and directories in a directory and put them in a list:

 

 

void IndexFilesInDir(const char *path, SFileMgrList* psReturnList)
{
 FRESULT res;
 DIR dir;
 FILINFO fno;
 uint8_t u8Count = 0;

 res = f_opendir(&dir, path); /* Open the directory */
 if(strlen(path) != 0)
 {
 	strcpy(&psReturnList->cFilename[0][0], "...");
 	psReturnList->bIsDir[0] = true;
 	psReturnList->bHasParentDirectory = true;
 	u8Count++;
 }
 else
 {
 	psReturnList->bHasParentDirectory = false;
 }
 if (res == FR_OK) {
 for (;;) {
 res = f_readdir(&dir, &fno); /* Read a directory item */
 if (res != FR_OK || fno.fname[0] == 0 || u8Count >= 128) break; /* Error or end of dir */
 	if (fno.fattrib & AM_DIR && strstr(fno.fname, "System Volume Information") == NULL)
 	{
 		psReturnList->bIsDir[u8Count] = true;
 		strcpy(&psReturnList->cFilename[u8Count][0], fno.fname);
 	u8Count++;
 	}
 	else
 	{
 		char *dot = strrchr(fno.fname, '.');
 		// Is this a .wav file?
 		if (dot != NULL && strcmp(dot, ".wav") == 0)
 		{
 			psReturnList->bIsDir[u8Count] = false;
 			strcpy(&psReturnList->cFilename[u8Count][0], fno.fname);
 			u8Count++;
 		}
 	}
 }
 psReturnList->endIndex = u8Count;
 f_closedir(&dir);
 } else {
 printf("Failed to open \"%s\". (%u)\n", path, res);
 }
}

 

 

This is based on the example from the FatFs website. Everything works fine until I start processing some audio data (= lots of SAI DMA interrupts). Then f_readdir returns random garbage or does not all the files or it lists files from another directory from time to time. Here's what I've tried and observed:

- Disabling the caches doesn't solve or change the problem.

- Reading and writing files via USB always works perfectly fine, even under high load conditions. I conclude from this that my low level SPI driver for the W25Q128 works fine.

- FatFs is only used in one Task.

- The microcontroller is not overloaded, other RTOS Tasks still work fine. Temporarily removing all the other tasks does not help or change anything.

 

Does anyone have any idea what could be causing this behavior?

Thanks!

    This topic has been closed for replies.
    Best answer by JP_ama

    Update: The problem was solved by implementing large data transfer via DMA when reading a sector.

    8 replies

    Technical Moderator
    November 12, 2023

    Hello @JP_ama 

    For instance, the issue is unclear. 

    Then f_readdir returns random garbage or does not all the files or it lists files from another directory from time to time. 

    It seems to be independent of the platform and possibly linked to the implementation. A possible way to try to disable SAI interrupts in file system task.

    JP_amaAuthor
    Visitor II
    November 12, 2023

    Thank you @FBL 

    So, do I understand you correctly that this is a known issue?

    Deactivating SAI interrupts in the file system task is not really an option, as this would lead to dropouts in the audio stream.

    Thanks!

    Super User
    November 12, 2023

    >under high load

    can you tell - 

    - data rate on SAI (in or out) + mode (INT , DMA (+ blocksize then))

    - data rate on fatfs -> USB stick (i assume)

    - int priorities you set 

    JP_amaAuthor
    Visitor II
    November 13, 2023

    Hi @AScha.3 

    SAI is running at 48 kHz or (worst case) 96 kHz with 24 bit. That means 4 bytes per channel = 8 bytes per sample = 768 kByte/s. Mode is DMA and the interrupt is triggered for every sample, because I am doing some zero latency FIR filtering.

    Data rate for the USB Mass Storage is around 70 kByte/s write (the flash memory is quite slow) and up to 250 kByte/s read. These data rates still work for the audio running at 48 kHz. At 96 kHz it drops down to 30 kByte/s write for example. But it still is 100% reliable. I never get any read or write errors via USB with audio processing running at the same time.

    However, when I am just listing the files (not even reading or loading files) using FatFs on the STM32 while audio processing is running I get random garbage every second to fifth time I read a directory. So, the data rate is really low. When I stop the SAI DMA it's 100% reliable as well.

    And these are my NVIC settings:

    Screenshot 2023-11-13 113116.png

    Thanks!

     

    Super User
    November 13, 2023

    Hi, 

    its late now...but in brief : your problem is int priority.

    ie. i have on H743 (and now on H563 beginning) an audio player running. :)

    read from SDcard or USB stick, data out with SAI /DMA circular (same you, almost); with FatFS.

    when playing cd audio 44/16 , cpu load is about 3% ; with FLAC file at 96k and 32bit also no problem, cpu still under 40% load; and decoder + 2x biquad filter (double precision) needing most time here, cpu at 200MHz clock, to run cool.

    so "play" with your int priorities ! all on same level is nonsense.

    try: usb 4 , DMA 9+10, tim so important? if not, 12.

     

    Graduate II
    November 13, 2023

    I wouldn't assume DISKIO is bullet-proof from casual observation.

    You're using 4KB blocks on the QSPI?

    Would perhaps worry about thread-safe and concurrent operation of FATFS and DISKIO. I'd definitely recommend serializing access to SPI and QSPI FLASH resource. You say it's in one task, which presumably should achieve that.

    Would instrument DISKIO to better understand the interactions and failure.

    Would get a current version of FATFS, not 2017 version ST ships.

    JP_amaAuthor
    Visitor II
    November 13, 2023

    Hi @Tesla DeLorean 

    Yes, I am using 4 kByte blocks. The W25Q128 is running in standard SPI configuration, not Quad SPI. It's fast enough for my application. The filesystem task is not accessing the flash memory via FatFs when USB is connected (and the other way around).

     


    @Tesla DeLorean wrote:

    Would instrument DISKIO to better understand the interactions and failure.

    How could I do that?

    I will try and replace FatFs with the latest version.

     

    Thank you!

    JP_amaAuthor
    Visitor II
    November 14, 2023

    @jamesoleg2 

    The SAI DMA interrupt is the only relevant interrupt. I used one timer to benchmark execution times. I disabled it just to make sure, no difference. USB completely disabled doesn't change anything either.

    The file manager task has the lowest FreeRTOS priority. There are two other DSP tasks which obviously need a higher priority as these are critical for the audio processing. Those tasks receive notifications from the DMA ISR when a new sample block has been acquired. I've had the impression that disabling these two tasks makes the problem appear a little less frequently, but listing directory contents is still unreliable.

    I am using this code for the W25Q128 SPI: https://github.com/nimaltd/w25qxx 

    I don't see any resource conflicts there. I ensured task stacks are big enough. I double checked for buffer overflows, no issues.

    As mentioned before, without the SAI DMA running no problems. But that also means the microcontroller is essentially idle. It's hard for me to debug this as this is not happening all the time. Any tips or tricks there? Also, when stepping through the code it happens less frequently. All I can say so far is that f_readdir is not returning the correct data. It's not completely random garbage like it would be the case with memory corruption. Usually it's simply not listing all the files, or it's listing a bunch of files mixed with files from another directory. Listing a bunch of folders without a name is common, too.

    @AScha.3 

    I don't understand how interrupt priorities could be the issue here. But I tried what you suggested and also tried different combinations. No success. SPI communication with the W25Q128 is not in interrupt mode. I disabled the timer, it's not needed anyways. USB enabled or completely removed makes no difference either. Also, since a lower number means higher priority, wouldn't it make more sense to keep the DMA interrupts low around 2 (= high priority) as these are the most important and time critical thing? And set the USB to a high number (= very low priority) since it is not time critical at all? 

    I just wanted to illustrate that the USB Mass Storage data transfer works 100% reliable, even under high load despite having a very low priority. And it shares almost identical code for the diskio. Whereas simply listing filenames (not a lot of data to read, only takes a very short amount of time) from the FreeRTOS task fails frequently.

     

    Thanks for all your suggestions and your help guys!

    Super User
    November 14, 2023

    @JP_ama  , ok . I just believed, it could be related to the int prio. , because this was very important in my setup.

    DMA is the most important thing in runniing continuous data stream - right. but if buffers are big enough, i am using 8K int32 , every callback has up to 40ms time for sevice; so INT prio can be very low, on the other hand in this time gap data needs to be read (or write) from USB or SD, so this has to be as fast as possible - thats why USB or SD needs higher prio. (completely different than first imagined - but now obvious ).

    According to your new tests i have no idea whats the root cause.

    Just - maybe, you should try a more actual version of fatfs (if you use the version by STM - its 6 y old!)

    http://elm-chan.org/fsw/ff/00index_e.html

     

    Graduate II
    November 14, 2023

    Just curious, what's your _SYNC_t?

    JP_amaAuthor
    Visitor II
    November 14, 2023

    @AScha.3 

    So I integrated the latest version of FatFs into my project. Not only did that not solve the problem, it actually made it worse. Reading directory contents now fails more frequently.

    @David Littell

    _SYNC_t is osSemaphoreId_t

    However, it doesn't make any difference whether FS_REENTRANT is enabled or disabled and according to the FatFs website reentrancy should be irrelevant in my case anyways because I am using only one volume (http://elm-chan.org/fsw/ff/doc/config.html#fs_reentrant). But I've tested both, without success.

    Super User
    November 14, 2023

    >it actually made it worse

    What a pity. I use it , because the filex i had to use on new H563 with Azure Rots made only problems.

    Fatfs running fine . here. 

    i dont know...but you said: Reading and writing files via USB always works perfectly fine. - so here you dont use fatfs ? because if fatfs here no problem, then the reason is not fatfs at all.

    JP_amaAuthor
    Visitor II
    November 14, 2023

    @AScha.3 

    No, I don't use FatFs with the USB Mass Storage Class because there is no need or reason to do so. The USB Mass Storage Class only uses my SPI driver read/write/erase functions for the W25Q128. Works great, that's why I suspected FatFs is the problem in the first place. FatFs diskio is using the exact same read/write/erase functions in my driver.

    JP_amaAuthor
    Visitor II
    November 15, 2023

    So I tried something else. I went back to my previous Nucleo board with the STM32F439. The code is the same, just with shorter filter lengths because the F4 is obviously not as powerful as an H7. And the F4 doesn't show this weird behavior at all, even at much higher processor loads. The only thing I noticed was that on the F4 it won't list more than eight files in a directory with very high loads (>90%) at 96k. But never random stuff or nameless directories that don't exist. Which already happens at 48k with the H7 (with longer filters but less cpu load though).

    JP_amaAuthor
    Visitor II
    January 15, 2024

    Alright, so I have finally found the time to continue work on this project and did some further testing. To verify the low level Disk IO driver is working properly I added some additional testing:

    1.) Preload the relevant sectors containing the FAT tables at the beginning of the program and copy them into some section of RAM (guaranteed correct data, identified by sector number)

    2.) Start the signal processing

    3.) Whenever FatFs is calling the ReadSector() function, compare the sector data that is being read with the preloaded sectors stored in RAM

     

    Note: File contents are never changed during runtime, it's read-only. Unfortunately, it has turned out that the correct reading of these sectors occasionally fails. So, some of you guys were right, it's actually not a FatFs problem. However, I don't understand why the SPI communication is unstable under high load. I dumped the data when the data read did not match the data stored in RAM, and it showed that sometimes sections of a sector were missing, sections were repeated, etc. This explains the random garbage output when listing files.

    My idea now would be to perhaps rewrite the SPI Flash driver using DMA. Or does anyone have a different suggestion? I still don't understand why and how the data corruption is happening. The timings of the W25Qxx flash are not critical and should not have any problems with interruptions and waiting times.

     

    Thanks!

    JP_amaAuthorAnswer
    Visitor II
    April 17, 2024

    Update: The problem was solved by implementing large data transfer via DMA when reading a sector.