Maintaining CPU data cache coherence for DMA buffers
This topic is inspired by discussions in ST forum and ARM forum, where a proper cache maintenance was sorted out and an example of a real-life speculative read was detected. Also there is another discussion, where a real-life example of cache eviction was detected.
For Tx (from memory to peripheral) transfers the maintenance is rather simple:
// Application code.
GenerateDataToTransmit(pbData, nbData);
// Prepare and start the DMA Tx transfer.
SCB_CleanDCache_by_Addr(pbData, nbData);
DMA_TxStart(pbData, nbData);For Rx (from peripheral to memory) transfers the maintenance is a bit more complex:
#define ALIGN_BASE2_CEIL(nSize, nAlign) ( ((nSize) + ((nAlign) - 1)) & ~((nAlign) - 1) )
uint8_t abBuffer[ALIGN_BASE2_CEIL(67, __SCB_DCACHE_LINE_SIZE)] __ALIGNED(__SCB_DCACHE_LINE_SIZE);
// Prepare and start the DMA Rx transfer.
SCB_InvalidateDCache_by_Addr(abBuffer, sizeof(abBuffer));
DMA_RxStart(abBuffer, sizeof(abBuffer));
// Later, when the DMA has completed the transfer.
size_t nbReceived = DMA_RxGetReceivedDataSize();
SCB_InvalidateDCache_by_Addr(abBuffer, nbReceived);
// Application code.
ProcessReceivedData(abBuffer, nbReceived);The first cache invalidation at line 6 before the DMA transfer ensures that during the DMA transfer the cache has no dirty lines associated to the buffer, which could be written back to memory by cache eviction. The second cache invalidation at line 11 after the DMA transfer ensures that the cache lines, which during the DMA transfer could be read from memory by speculative reads, are discarded. Therefore cache invalidation for Rx buffers must be done before and after DMA transfer and skipping any of these will lead to Rx buffer corruption.
Doing cache invalidation on arbitrary buffer can corrupt an adjacent memory before and after the particular buffer. To ensure that it does not happen, the buffer has to exactly fill an integer number of cache lines. For that to be the case, the buffer address and size must be aligned to the size of cache line. CMSIS defined constant for data cache line size is __SCB_DCACHE_LINE_SIZE and it is 32 bytes for Cortex-M7 processor. The __ALIGNED() is a CMSIS defined macro for aligning the address of a variable. And the ALIGN_BASE2_CEIL() is a custom macro, which aligns an arbitrary number to the nearest upper multiple of a base-2 number. In this example the 67 is aligned to a multiple of 32 and respectively the buffer size is set to 96 bytes.
Unfortunately for Cortex-M processors ARM doesn't provide a clear explanation or example, but they do provide a short explanation for Cortex-A and Cortex-R series processors.
