SRAM1 / SRAM2 / SRAM3 tradeoff with respect to CPU / GPDMA
Ok, this is perhaps esoteric, so maybe there is some research out there.
I am looking at STM32U3 or STM32C5, but the problem applies to pretty much all CM33 based ST MCUs.
The bus matrix implies a fast multiplexor for GPDMA (and SDMMC) to SRAM1. So unless dealing with burst transfers to SRAM (yes, SDMMC does that, and the GPDMA channels with the 32 byte FIFO could do that if the transfer is a multiple of 4 bytes), this fast multiplexor seems to save 1 clock reducing a beat to 2 clocks (if somebody has details for AHB vs. APB, that would be interesting). So to minimize latency and maximize bandwidth, DMA buffers (.dma section) should go into SRAM1.
So if DMA goes into SRAM1, then should not all the stacks go into SRAM2 ? I would assume burst DMA to/from SRAM1 would introduce some additional latency, especially in the way for ISR stacking / unstacking. Is this just overthinking this, or is that measurable ?
Now some U3 parts have SRAM3. One thought is to allocate DMA buffers for SDMMC from SRAM3, as SDMMC deals with 512 byte chunks, so burst transfers. By letting GPDMA stay in SRAM1, the longer bursts from SDMMC should not affect GPDMA.
In a scheme like this, where should .data/.bss/.no init go ? SRAM1, where there is lower latentency, but less bandwidth, or SRAM3, where there is potentially higher latency ?
In general is there some analysts out there looking at .stack vs. .data/.bss/.noinit vs .heap latency/bandwidth ?
