Skip to main content
Visitor II
January 17, 2024
Question

STM32L0x3 Interrupt Latency

  • January 17, 2024
  • 8 replies
  • 5003 views

Good morning,

I am working with a board mounting a STM32L0x3 microcontroller. My SystemCoreClock is 32 MHz and TIM12 is clocked at 4 MHz. TIM12 has been configured in order to work in Capture Mode on Channel 1 (falling edge).
The ISR does what is listed below

void TIM21_IRQHandler(void)

{

/* USER CODE BEGIN TIM21_IRQn 0 */

 

/* USER CODE END TIM21_IRQn 0 */

 

GPIOB->BSRR = GPIO_PIN_11 ;

 

/* USER CODE BEGIN TIM21_IRQn 1 */

 

/* USER CODE END TIM21_IRQn 1 */

}

 


To my surprise, using an oscilloscope, I noted that time elapsed since the falling edge of the signal the other pin is set is about 1us.
Why do I have all this latency?
Thanks to all who will support me

    This topic has been closed for replies.

    8 replies

    Graduate II
    January 17, 2024

    For an interrupt at least some up to 16 registers need to be pushed on the stack , the IRQ code needs to be fetched from with flash wait state and finally the GPIO needs to be set. This will take some cycles and at 32 MHz 1 microsecond are only 32 cycles.

    Dave94Author
    Visitor II
    January 17, 2024

    How many clock cyles are needed to fetch the IRQ and set the output?

    Super User
    January 17, 2024
    > How many clock cyles are needed to fetch the IRQ and set the output?
     
    First, there may be some delay for the interrupt signal to propagate from TIM to NVIC.
     
    Then all latencies stemming from multi-cycle instructions execution, other interrupts of higher or equal priority being executed, from global interrupt disables/enables in code, etc., have to be taken into account
     
    Then the interrupt entry process starts. Unless there's some extreme FLASH latency (assuming the interrupt vector table is in FLASH), registers (not 16 but 8 ) are stacked together with fetching the interrupt vector, and that lasts 12 cycles.
     
    Then the interrupt code has to be executed - see disasm for what instructions are involved (e.g. C function prologue), and if executing from FLASH, take into account the FLASH latency - up until the point when the GPIO register is written, and that write has to propagate through the bus matrix to GPIO itself.
     
    32 cycles is just about right for a very simple ISR written in C, with no other sources of latency present.
     
    JW
     
    Dave94Author
    Visitor II
    January 18, 2024

    I have found at this link that the interrupt latency for a CortexM0+ should be 15 clock cycles.

    https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors

    Even if I consider that this instruction

    GPIOB->BSRR = GPIO_PIN_11 ;

    could take 5 clock cycles, why should I get 32 cycles you mentioned?

    Super User
    January 18, 2024

    Ok, and what did you set as optimizer level ? If you want it fast, need -O2 or -Ofast , otherwise code is not at the expected speed, but good only for debug.

    -> project settings

    AScha3_0-1705574847410.png

     

    Super User
    January 18, 2024

    > I have found at this link that the interrupt latency for a CortexM0+ should be 15 clock cycles.

    I stand corrected; wasn't aware of the difference in CM0/CM0+. Thanks.

    > Even if I consider that this instruction

    > GPIOB->BSRR = GPIO_PIN_11 ;

    > could take 5 clock cycles,

    Why would that be the case?

    Post disasm of the ISR.

    What's the FLASH latency?

    JW

    ST Employee
    February 15, 2024

    Hi @Dave94 

     

    The forum moderator had marked your post as needing a little more investigation and direct support. An online support case has been created on your behalf, please stand by for just a moment and you will hear from us.

     

    Regards,

    Billy

    Super User
    February 15, 2024

    Hi @Billy OWEN ,

    will also we hear from you here?

    JW

    ST Employee
    February 15, 2024

    Hi @waclawek.jan 

     

    My piece of the pie was the small legwork to get the right people involved, the topic is no longer in my hands. Since there was an amount of back and forth here (and time) before the moderators chose to take it internally, let me make notes in the case to the MCU apps team that closing the loop back here is requested.

     

    Thanks,

    Billy

     

     

    Super User
    February 15, 2024

    Hi @Billy OWEN ,

    > let me make notes in the case to the MCU apps team that closing the loop back here is requested

    Thanks.

    We all can learn from cases which get resolved eventually - and sometimes also from those which don't.

    JW

    ST Employee
    February 19, 2024

    Hello All,
    testing conditions for initial question are not representative to obtain latency of ISR.
    Running at 32MHz Sysclock add extra flash wait state and different AHB and APBx clocks add synchronization delay between buses which caused variation in ISR latency.
    IO toggling managed by accessing to port register delays up to 6 cycles – depends on optimization setting.

    I was testing on my side and able to reduce ISR latency to 21 clocks and 20 cycles with fast optimization. Measured by logic analyzer from raising edge on input captured channel to raising edge of output pin controlled by __SEV() instruction in TIM21_IRQHandler. Test application running @4MHZ = 0 flash wait state and AHB = APBx. By moving ISR in SRAM does not reach better results.

    Currently I’m asking internally where these extra 4/5 cycles are burned because ISR latency on M0+ is 15 cycles. But expecting latency in TIMER and propagation to NVIC.

    BR. Jan

    Super User
    February 19, 2024

    Hi @Jan KRUPICKA ,

    Thanks for the comment.

    Can you please post the ISR disasm (possibly together with source) for further discussion?

    I can see several ways how to expand this to a knowledgebase article - e.g. gradually adding FLASH latency (in order to increase clock, although clock does not necessarily has to be increased), APB prescalers, TIM input channel filter, various levels of optimization, larger ISR code (which results in more extensive prologue), Cube/HAL ISR, other ISRs, execution of multi-byte instructions or other instructions which may delay interrupt execution (possibly introducing jitter rather than hard latency)  - and demonstrate the effect of all these on the resulting latency. @KB-IDEA 

    Also, in the initial post, GPIO has been used rather than SEV. In Cortex-M0+, as specialty of this particular core, GPIOs are on the private IO bus of the processor. While one would expect, that that would have minimum latency (i.e. the output would change in then next cycle after the respective st instruction has been executed), is it really so (i.e. how does using GPIO compare to using SEV?)

    Thanks,

    JW