STM32 H757 DMA IRQ latency
Hello,
I am using a STM32H757 to convert 12 analog channels using ADC2 and ADC3 (code running @ Cortex M7).
- Both ADC are configured at 42MHz sync clock, triggered by a timer
- ADC2 use DMA1s0 to transfert data to AXI SRAM.
- ADC3 use DMA1s1 to transfert data to AXI SRAM.
Using a scope, I can see a little drop on the analog lines revealing the sampling point. The 12 sampling point timing fit perfectly the expected behavior (relative to the timer pulse), so, ADC clock, sample time and conversion time are verified and good.
CPU clock = 336MHz
AXI/AHB clock = 84MHz
background, IRQ routine and vector table code are all moved in ITCMRAM. All data resides in DTCMRAM. There is nothing running from flash and no data in SRAM.
Code compiled in release mode, optimized for speed.
I'm using the HAL only for init. All run-time code is using direct register access.
I'm using a GPIO configured as EVENTOUT and asm("sev") to instrument the code.
void DMA1_Stream0_IRQHandler(void)
{
asm volatile("sev");
asm volatile("" ::: "memory"); // Avoid asm code re-ordering by the optimizer.
dmaADC2regs->LIFCR = 0x20; // Clear IT flag using CTCIF1 bit
asm volatile("" ::: "memory"); // Avoid asm code re-ordering by the optimizer.
// Copy 12 uint16_t word from AXI SRAM to DTCM RAM (64 bits data bus : 3 tranferts)
uint64_t* dst = (uint64_t*)adc_dtcm;
uint64_t const* src=(uint64_t const*)adc_axi;
for(int i = 0; i < 3; i++)
dst[i] = src[i];
asm volatile("" ::: "memory"); // Avoid asm code re-ordering by the optimizer.
asm volatile("sev");
}From ADC end of last sample conversion to GPIO/EVENTOUT edge : 556ns -0/+24ns (187 -0/+8 cycles)
Interrupt execution time ~370ns
The execution time is fine but the IRQ servicing time is definitely too long for my app. I don't see what can I do to be faster. 187 cycles looks too much, there should be something wrong somewhere.
Any suggestion will be well appreciated :)
Thanks
