Visitor II

Question

STM32L0x3 Interrupt Latency

Forum|Forum|2 years ago
January 17, 2024
8 replies
5003 views

Good morning,

I am working with a board mounting a STM32L0x3 microcontroller. My SystemCoreClock is 32 MHz and TIM12 is clocked at 4 MHz. TIM12 has been configured in order to work in Capture Mode on Channel 1 (falling edge).
The ISR does what is listed below

void TIM21_IRQHandler(void)

{

/* USER CODE BEGIN TIM21_IRQn 0 */

/* USER CODE END TIM21_IRQn 0 */

GPIOB->BSRR = GPIO_PIN_11 ;

/* USER CODE BEGIN TIM21_IRQn 1 */

/* USER CODE END TIM21_IRQn 1 */

}

To my surprise, using an oscilloscope, I noted that time elapsed since the falling edge of the signal the other pin is set is about 1us.
Why do I have all this latency?
Thanks to all who will support me

This topic has been closed for replies.

U

Uwe Bonnes

Graduate II

For an interrupt at least some up to 16 registers need to be pushed on the stack , the IRQ code needs to be fetched from with flash wait state and finally the GPIO needs to be set. This will take some cycles and at 32 MHz 1 microsecond are only 32 cycles.

D

Dave94Author

Visitor II

How many clock cyles are needed to fetch the IRQ and set the output?

W

waclawek.jan

Super User

> How many clock cyles are needed to fetch the IRQ and set the output?

First, there may be some delay for the interrupt signal to propagate from TIM to NVIC.

Then all latencies stemming from multi-cycle instructions execution, other interrupts of higher or equal priority being executed, from global interrupt disables/enables in code, etc., have to be taken into account

Then the interrupt entry process starts. Unless there's some extreme FLASH latency (assuming the interrupt vector table is in FLASH), registers (not 16 but 8 ) are stacked together with fetching the interrupt vector, and that lasts 12 cycles.

Then the interrupt code has to be executed - see disasm for what instructions are involved (e.g. C function prologue), and if executing from FLASH, take into account the FLASH latency - up until the point when the GPIO register is written, and that write has to propagate through the bus matrix to GPIO itself.

32 cycles is just about right for a very simple ISR written in C, with no other sources of latency present.

JW

D

Dave94Author

Visitor II

I have found at this link that the interrupt latency for a CortexM0+ should be 15 clock cycles.

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/beginner-guide-on-interrupt-latency-and-interrupt-latency-of-the-arm-cortex-m-processors

Even if I consider that this instruction

GPIOB->BSRR = GPIO_PIN_11 ;

could take 5 clock cycles, why should I get 32 cycles you mentioned?

A

AScha.3

Super User

Ok, and what did you set as optimizer level ? If you want it fast, need -O2 or -Ofast , otherwise code is not at the expected speed, but good only for debug.

-> project settings

W

waclawek.jan

Super User

> I have found at this link that the interrupt latency for a CortexM0+ should be 15 clock cycles.

I stand corrected; wasn't aware of the difference in CM0/CM0+. Thanks.

> Even if I consider that this instruction

> GPIOB->BSRR = GPIO_PIN_11 ;

> could take 5 clock cycles,

Why would that be the case?

Post disasm of the ISR.

What's the FLASH latency?

JW

B

Billy OWEN

ST Employee

Hi @Dave94

The forum moderator had marked your post as needing a little more investigation and direct support. An online support case has been created on your behalf, please stand by for just a moment and you will hear from us.

Regards,

Billy

W

waclawek.jan

Super User

Hi @Billy OWEN ,

will also we hear from you here?

JW

B

Billy OWEN

ST Employee

Hi @waclawek.jan

My piece of the pie was the small legwork to get the right people involved, the topic is no longer in my hands. Since there was an amount of back and forth here (and time) before the moderators chose to take it internally, let me make notes in the case to the MCU apps team that closing the loop back here is requested.

Thanks,

Billy

W

waclawek.jan

Super User

Hi @Billy OWEN ,

> let me make notes in the case to the MCU apps team that closing the loop back here is requested

Thanks.

We all can learn from cases which get resolved eventually - and sometimes also from those which don't.

JW

J

Jan KRUPICKA

ST Employee

Hello All,
testing conditions for initial question are not representative to obtain latency of ISR.
Running at 32MHz Sysclock add extra flash wait state and different AHB and APBx clocks add synchronization delay between buses which caused variation in ISR latency.
IO toggling managed by accessing to port register delays up to 6 cycles – depends on optimization setting.

I was testing on my side and able to reduce ISR latency to 21 clocks and 20 cycles with fast optimization. Measured by logic analyzer from raising edge on input captured channel to raising edge of output pin controlled by __SEV() instruction in TIM21_IRQHandler. Test application running @4MHZ = 0 flash wait state and AHB = APBx. By moving ISR in SRAM does not reach better results.

Currently I’m asking internally where these extra 4/5 cycles are burned because ISR latency on M0+ is 15 cycles. But expecting latency in TIMER and propagation to NVIC.

BR. Jan

W

waclawek.jan

Super User

Hi @Jan KRUPICKA ,

Thanks for the comment.

Can you please post the ISR disasm (possibly together with source) for further discussion?

I can see several ways how to expand this to a knowledgebase article - e.g. gradually adding FLASH latency (in order to increase clock, although clock does not necessarily has to be increased), APB prescalers, TIM input channel filter, various levels of optimization, larger ISR code (which results in more extensive prologue), Cube/HAL ISR, other ISRs, execution of multi-byte instructions or other instructions which may delay interrupt execution (possibly introducing jitter rather than hard latency) - and demonstrate the effect of all these on the resulting latency. @KB-IDEA

Also, in the initial post, GPIO has been used rather than SEV. In Cortex-M0+, as specialty of this particular core, GPIOs are on the private IO bus of the processor. While one would expect, that that would have minimum latency (i.e. the output would change in then next cycle after the respective st instruction has been executed), is it really so (i.e. how does using GPIO compare to using SEV?)

Thanks,

JW

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded