Skip to main content
Visitor II
January 11, 2022
Solved

SD Card + RTOS on F7 processor won't work with D-Cache enabled! (My fix below)

  • January 11, 2022
  • 5 replies
  • 2823 views

*** EDIT ***

Seems this was a memory issue rather than a D-Cache issue, the D-Cache option fixed it as a side effect!

*** EDIT ***

I have a project setup as follows:

STM32F765IIKx

RTOS

SD Card on SDMMC1 configured as SD 4 bits Wide, DMA Enabled

I followed this example, however, after a day of head scratching and nothing working I found that the only way to get it to work was disable the D-Cache. Once I had found this issue I remembered seeing the option within the "sd_dsikio.c" file for cache maintenance.

/*
 * when using cacheable memory region, it may be needed to maintain the cache
 * validity. Enable the define below to activate a cache maintenance at each
 * read and write operation.
 * Notice: This is applicable only for cortex M7 based platform.
 */
/* USER CODE BEGIN enableSDDmaCacheMaintenance */
 #define ENABLE_SD_DMA_CACHE_MAINTENANCE 1
/* USER CODE END enableSDDmaCacheMaintenance */
 
/*
* Some DMA requires 4-Byte aligned address buffer to correctly read/write data,
* in FatFs some accesses aren't thus we need a 4-byte aligned scratch buffer to correctly
* transfer data
*/
/* USER CODE BEGIN enableScratchBuffer */
//#define ENABLE_SCRATCH_BUFFER
/* USER CODE END enableScratchBuffer */

At a punt I decided to set this option and re-enable the D-Cache, this now works.

So why is it that the F7 processor also requires this cache maintenance setting for SD to work properly? I've not used any cache maintenance with my UART DMA routines and they work fine on this F7 processor, however I did need UART DMA cache maintenance on anther project using an H7 processor?

*EDIT*

I also needed this config switch set! I had removed it thinking it was only the D-Cache issue but this is also required for correct operation.

/* USER CODE BEGIN enableScratchBuffer */
#define ENABLE_SCRATCH_BUFFER
/* USER CODE END enableScratchBuffer */

#define ENABLE_SCRATCH_BUFFER

Matt.

    This topic has been closed for replies.
    Best answer by mƎALLEm

    You can either set up a non cached memory region which is the simplest way but decreases the CPU perf.

    or doing cache maintenance as described in the section 3.2 Example for cache maintenance and data coherency:

    "The data coherency between the core and the DMA is ensured by:

    1. Either making the SRAM1 buffers not cacheable

    2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

    3. Or modifying the SRAM1 region in the MPU attribute to a shared region.

    4. Or making the SRAM1 buffer cache enabled with write-through policy."

    Note write-through policy is not recommended for F7 : Errata 2.1.1 Cortex®-M7 data corruption when using data cache configured in write-through

    5 replies

    Technical Moderator
    January 12, 2022

    Dear @mantisrobot​ ,

    Please refer to the AN4839 "Level 1 cache on STM32F7 Series and STM32H7 Series" /

    sections :

    3.2 Example for cache maintenance and data coherency

    4 Mistakes to avoid and tips

    SofLit

    Visitor II
    January 12, 2022

    Hi,

    Thanks for the tips and link!

    First of all I have been getting inconsistent results as per my post above, so it seems it wasn't a D-Cache problem, rather a memory problem and the D-Cache was a side effect fix!

    I have read through the tips and mistakes section and I think I'm doing things correctly, so fro a UART DMA write I would do something like this:

    #define BUFFER_SIZE 100 
    // align buffer and make sure its multiple of 32 bytes
    ALIGN_32BYTES(uint8_t	dmaBuffer[(BUFFER_SIZE+31U)&~(uint32_t)0x1F]);
     
    // is DCache enabled
    if ((SCB->CCR & SCB_CCR_DC_Msk) != 0U)
    	{
    	// clean Dcache buffer
    	SCB_CleanDCache_by_Addr((uint32_t*)(((uint32_t)dmaBuffer) & ~(uint32_t)0x1F), BUFFER_SIZE);
    	}
     
    // start DMA transmit 
    HAL_UART_Transmit_DMA(&usart6, (uint8_t *)dmaBuffer, BUFFER_SIZE );

    And for receive:

    #define BUFFER_SIZE 100 
    // align buffer and make sure its multiple of 32 bytes
    ALIGN_32BYTES(uint8_t	dmaRxBuffer[(BUFFER_SIZE+31U)&~(uint32_t)0x1F]);
     
    // is DCache enabled
    if ((SCB->CCR & SCB_CCR_DC_Msk) != 0U)
    	{
    	// clean Dcache buffer
    	SCB_InvalidateDCache_by_Addr((uint32_t*)(((uint32_t)dmaRxBuffer) & ~(uint32_t)0x1F), BUFFER_SIZE);
    	}
     
    // start DMA rx
    HAL_UARTEx_ReceiveToIdle_DMA(&usart6, (uint8_t *)dmaRxBuffer, BUFFER_SIZE );
    __HAL_DMA_DISABLE_IT(&usart6, DMA_IT_HT);

    However reading these two lines make me think I should not be using the DCache clean method, rather setting up no-cached memory regions? I'm not sure where to start with that.

    * • Always better to use non-cacheable regions for DMA buffers. The software can use the MPU to set up non-cacheable memory block to use as a shared memory between the CPU and DMA.

     * • Do not enable cache for the memory that is being used extensively for a DMA operation.

    mƎALLEmAnswer
    Technical Moderator
    January 12, 2022

    You can either set up a non cached memory region which is the simplest way but decreases the CPU perf.

    or doing cache maintenance as described in the section 3.2 Example for cache maintenance and data coherency:

    "The data coherency between the core and the DMA is ensured by:

    1. Either making the SRAM1 buffers not cacheable

    2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

    3. Or modifying the SRAM1 region in the MPU attribute to a shared region.

    4. Or making the SRAM1 buffer cache enabled with write-through policy."

    Note write-through policy is not recommended for F7 : Errata 2.1.1 Cortex®-M7 data corruption when using data cache configured in write-through

    Visitor II
    January 12, 2022

    Ok,

    So currently I'm using method 2 right?

    2. Or making the SRAM1 buffers cache enabled with write-back policy, with the coherency ensured by software (clean or invalidate D-Cache)

    Other than simplicity is there any advantage to method 1 or 3? Other than the performance loss.

    Option 4 is out due to Errata 2.1.1

    Technical Moderator
    January 12, 2022

    So currently I'm using method 2 right?

    Answer: usage of SCB_CleanDCache_by_Addr()/SCB_InvalidateDCache_by_Addr() --> Yes.

    Other than simplicity is there any advantage to method 1 or 3? Other than the performance loss. Answer: just simplicity / loss of CPU perf for non-cached regions..

    Visitor II
    January 12, 2022

    Thanks.

    I'm not sure how to use the MPU yet so I'll stick with method 2 for now.

    I'm currently developing a programme that uses RTOS and I'm using the FATFS SD middleware driver within STMCubeIDE. I notice within the SD disk IO routines there are while loops with timeouts that can take up to 30 seconds (default) but there is not yield to the OS:

    I've included and example of the function SD_CheckStatusWithTimeout() and added a comment with an osDelay(1) inside the loop. Curious why this isn't added for the RTOS implementation when a loop could block for so long?

    static int SD_CheckStatusWithTimeout(uint32_t timeout)
    {
     uint32_t timer;
     /* block until SDIO peripheral is ready again or a timeout occur */
    #if (osCMSIS <= 0x20000U)
     timer = osKernelSysTick();
     while( osKernelSysTick() - timer < timeout)
    #else
     timer = osKernelGetTickCount();
     while( osKernelGetTickCount() - timer < timeout)
    #endif
     {
     if (BSP_SD_GetCardState() == SD_TRANSFER_OK)
     {
     return 0;
     }
    // ***************************************************************************************
     // SHOULDNT THIS BE HERE TO GIVE CONTROL BACK TO SCHEDULER?
     	osDelay(1);
    // ***************************************************************************************
     }
     
     return -1;
    }

    Technical Moderator
    January 12, 2022

    For MPU usage I propose to refer to the AN4838 "Managing memory protection unit in STM32 MCUs"

    I propose to open another thread for the latter issue and close this thread (cache usage with SD card) by selecting the Best answer for you.