Explorer

Question

Interrupt curiosity (STM32G491)

Forum|Forum|7 months ago
July 16, 2025
14 replies
1319 views

Hello friends : )

I've recently tested the NUCLEO-G491RE board for an upcoming redesign.

Current design relies on quite tough interrupt latency so here is where I begun.

As I understand, with no FPU usage (in isr) (ASPEN + LSPEN = 0) one could expect up to 12 SYSCLK latency.

In this case 75ns at 160MHz which sounds pretty decent (today it's about 73ns).

I started off by creating two very similar interrupt services I intended to toggle between.

Each would pulse a output pin and trigger the other one:

Attributes: 'interrupt' + optimize("-O2")' + 'section(".RamFunc")' + 'aligned(8)' + 'naked'

static void onTimeStampEvent(void)
{
 GPIOA->BSRR = 1<<12;
 NVIC->ISPR[1] = 1<<(39-1*32); // USART3
 GPIOA->BRR = 1<<12;
#ifdef NAKED
 __ASM volatile ("BX LR":::);
#endif
 return;
}

static void onReceiveEvent(void)
{
 GPIOB->BSRR = 1<<14;
 NVIC->ISPR[0] = 1<<(11-0*32); // DMA1_CH1
 GPIOB->BRR = 1<<14;
#ifdef NAKED
 __ASM volatile ("BX LR":::);
#endif
 return;
}

In main the usual suspects:

HAL and System initiation
Peripheral initiation
Setting up interrupts

Finally, the main loop:

 u = 0;
 while(1)
 {
 NVIC->ISPR[0] = 1<<(11-0*32); // DMA1_CH1
 u++;
 }

This works splendidly, sort of...

The total time for a complete round trip is about 381ns or 61 SYSCLK.

Variable "u" in main never changes from "0" suggesting expected continuous interrupts.

The thing is, would not tail-chaining occur?

Now I tried diversify priority levels, yellow being less important:

Priority in action, indeed, however now a round trip takes 562ns (90 SYSCLK).

Still no tail-chaining, and worse, lots of extra time for the same amount of work.

Where have I done wrong?

Any help appreciated = )

/Hen

This topic has been closed for replies.

W

waclawek.jan

Super User

Try to interpret the timing we see using the disasm.

JW

H

HenrikGladerAuthor

Explorer

Im sorry, not to knowledged in STM matters.

Would you please explain a little bit further?

BTW, thank you = )

/Hen

W

waclawek.jan

Super User

If you want to discuss clock-level timing, you have to have a look at the particular instructions executed, not the source code.

JW

H

HenrikGladerAuthor

Explorer

Yes, that makes perfect sense : )

 onTimeStampEvent:
20000010: mov.w r3, #1207959552 @ 0x48000000
20000014: mov.w r2, #4096 @ 0x1000
20000018: str r2, [r3, #24]
 140 NVIC->ISPR[1] = 1<<(39-1*32); // USART3
2000001a: ldr r3, [pc, #20] @ (0x20000030 <onTimeStampEvent+32>)
2000001c: movs r2, #128 @ 0x80
2000001e: str.w r2, [r3, #260] @ 0x104
 141 GPIOA->BRR = 1<<12;
20000022: mov.w r3, #1207959552 @ 0x48000000
20000026: mov.w r2, #4096 @ 0x1000
2000002a: str r2, [r3, #40] @ 0x28
 143 __ASM volatile ("BX LR":::);
2000002c: bx lr
 145 return;

 onReceiveEvent:
20000038: ldr r3, [pc, #28] @ (0x20000058 <onReceiveEvent+32>)
2000003a: mov.w r2, #16384 @ 0x4000
2000003e: str r2, [r3, #24]
 161 NVIC->ISPR[0] = 1<<(11-0*32); // DMA1_CH1
20000040: ldr r3, [pc, #24] @ (0x2000005c <onReceiveEvent+36>)
20000042: mov.w r2, #2048 @ 0x800
20000046: str.w r2, [r3, #256] @ 0x100
 162 GPIOB->BRR = 1<<14;
2000004a: ldr r3, [pc, #12] @ (0x20000058 <onReceiveEvent+32>)
2000004c: mov.w r2, #16384 @ 0x4000
20000050: str r2, [r3, #40] @ 0x28
 164 __ASM volatile ("BX LR":::);
20000052: bx lr
 166 return;

Is this what you're talking about?

/Hen

H

HenrikGladerAuthor

Explorer

Will be offline a couple of days, but some extras before I go.

I've set break on every handler (stopped systick) and no trap.

HAL_SuspendTick();

The debug variable is at file scope and volatile.

volatile unsigned u;

Tested different priorities with no change (except both the same).

Verified no FPU registers being stacked.

Much appreciated for any insight

/Hen

P

Pavel A.

Super User

NVIC->ISPR[0] = 1<<(39-1*32); // DMA1_CH1

What do you think this statement will do? (Hint: C is not python!)

This MCU has core-coupled RAM (CCM SRAM), you can put these functions there for better latency.

H

HenrikGladerAuthor

Explorer

Hmm, from where did you copy this line?

Either "NVIC->ISPR[1] = 1<<(39-1*32); // USART3"

or: "NVIC->ISPR[0] = 1<<(11-0*32); // DMA1_CH1"

My intention was using bit-banding but did not reach that far yet.

On the CCM, indeed cycles were shelved off, more than I expected...

On equal priority: from 61 to 52 SYSCLK (PicoScope1.png)

With escalating priority: from 90 to 69 SYSCLK (PicoScope2.png)

But still, more SYSCLK with differentiated priorities.

What happened with "Interrupt tail-chaining"?

BTW, I tried booting without debugger but no changes...

W

waclawek.jan

Super User

Where is the vector table located?

Where is the stack located?

JW

H

HenrikGladerAuthor

Explorer

Good points : )

SP->SRAM2 // 0x2001BFF0 (BG) and 0x2001BFD0 (IC)

VTOR->SRAM1 // 0x20000800

IC @ CCMRAM // 0x10000000..0x10000044

Maybe swapping SP and VTOR linkage would be more efficient?

I probably need a stack frame (non-naked) in the end.

W

waclawek.jan

Super User

Try moving stack to CCMRAM.

Fetching the ISR address from vector table should occur in parallel with the registers stacking, so they should be in different memories but at the same time the vector fetch should have less of an impact if it lasts longer.

These are very complex SoC-s, where cycle counting is cumbersome due to the many elements involved, and the theoretical numbers from the processor's specs alone are in practice usually impossible to reach, again due to the huge influence of the whole SoC. Generally, consider the processor's specs to be just sweet marketing speech.

JW

H

HenrikGladerAuthor

Explorer

Yes, fetching the vector probably occur once every interrupt taken.

And due to the initial multi stacking, the above may be less important regarding latency.

However, wouldn't the stack starve the instruction pipe, residing in CCMRAM as well?

Regarding cycle counting, yes there's a lot going on in parallel here and exact numbers may not exist.

That's not the core of my hacks, just a feel for what's feasible and concurrent comparisons.

I think I can get away with about 60ns latency spread on The One important event.

Today it's 30ns. Well, assuming no degraded performance, which also could be considered.

Then the unexpected scenario happened, interrupt escalation with performance penalty.

That's the real bugger, is it not?

W

waclawek.jan

Super User

Latency and its jitter is at least an order of magnitude worse in these SoCs than in the 8-bit micro*controllers*, and so is its controllability and state of documentation. So, I don't consider interrupts to be a viable option for timing sensitive tasks anymore, and always resort to hardware.

JW

H

HenrikGladerAuthor

Explorer

Yes, so it is.

I do utilize DMA:s as the snippet may reveal.

The competing core is also a *modern* MCU albeit a bit more recent than the G4.

It is quite up to the task, but for 'platformics' we need to move here.

In the near future we aim at the H7+

W

waclawek.jan

Super User

> In the near future we aim at the H7+

The interrupt latency and overall timing controllability Cortex-M7 is of course worse than in Cortex-M4.

That's the price we pay for raw speed.

JW

H

HenrikGladerAuthor

Explorer

I just realized this may not work for us in this way?

CCMRAM seems to be the best performance option.

Tried several linkages for stack (at highest address) vs ram-code with not so obvious results.

The stack (at high end CCM area) and the ram-code (at the low end CCM area).

1. Both in 0x10000000[0x4000] --- 52ck vs 67ck.

2. Both in 0x20018000[0x4000] --- 72ck vs 86ck.

3. stack in 0x1xxx code in 0x2xxx --- 72ck vs 92ck.

4. stack in 0x2xxx code in 0x1xxx --- 26ck vs 69ck. ???

There is no spread present because this is the only thing the core does.

What happens when all bells and whistles are in place?

Will the spread (for the most prominent interrupt) be more than 60ns (10sc)?

I understand the tests do not say squat about this but them gives worries...

Show more replies

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded