Skip to main content
acapola
Associate II
January 30, 2026
Question

STM32N657: unexpected delay in execution

  • January 30, 2026
  • 1 reply
  • 127 views

My original aim was to prove to myself that I was able to measure execution time with cycle accuracy. In order to do that, I have 9 'test_dwt' functions, they all have same code except number of nop between 2 reads of DWT.

That gives unexpected results because M55 has dual-issue capability, as people explained here

Nevertheless, I am puzzled with the result of test_dwt7 (7 nops). All clocks are at 100MHz, interrupts, icache and dcache are disabled, all 'test_dwt' functions starts with isb and dsb instructions.

Here I print the result of 5 executions of each function. The execution order is 0 to 7c, repeated 5 times, saving results in an array. Then I print the results:

test_dwt0 : 1 1 1 1 1 
test_dwt1 : 1 1 1 1 1 
test_dwt2 : 1 1 1 1 1 
test_dwt3 : 2 2 2 2 2 
test_dwt4 : 2 2 2 2 2 
test_dwt5 : 3 3 3 3 3 
test_dwt6 : 3 3 3 3 3 
test_dwt7 : 27 27 27 27 27 
test_dwt8 : 5 5 5 5 5 
test_dwt7b : 4 4 4 4 4 
test_dwt7c : 4 4 4 4 4

the code of the 4 last functions:

34180824 <test_dwt7>:
34180824:	f3bf 8f6f 	isb	sy
34180828:	f3bf 8f4f 	dsb	sy
3418082c:	f8df 3018 	ldr.w	r3, [pc, #24]	@ 34180848 <.test_dwt7.dwt_base>
34180830:	6858 	ldr	r0, [r3, #4]
34180832:	bf00 	nop
34180834:	bf00 	nop
34180836:	bf00 	nop
34180838:	bf00 	nop
3418083a:	bf00 	nop
3418083c:	bf00 	nop
3418083e:	bf00 	nop
34180840:	6859 	ldr	r1, [r3, #4]
34180842:	eba1 0000 	sub.w	r0, r1, r0
34180846:	4770 	bx	lr

34180848 <.test_dwt7.dwt_base>:
34180848:	1000 	asrs	r0, r0, #32
3418084a:	e000 	b.n	3418084e <test_dwt8+0x2>

3418084c <test_dwt8>:
3418084c:	f3bf 8f6f 	isb	sy
34180850:	f3bf 8f4f 	dsb	sy
34180854:	4b06 	ldr	r3, [pc, #24]	@ (34180870 <.test_dwt8.dwt_base>)
34180856:	6858 	ldr	r0, [r3, #4]
34180858:	bf00 	nop
3418085a:	bf00 	nop
3418085c:	bf00 	nop
3418085e:	bf00 	nop
34180860:	bf00 	nop
34180862:	bf00 	nop
34180864:	bf00 	nop
34180866:	bf00 	nop
34180868:	6859 	ldr	r1, [r3, #4]
3418086a:	eba1 0000 	sub.w	r0, r1, r0
3418086e:	4770 	bx	lr

34180870 <.test_dwt8.dwt_base>:
34180870:	1000 	asrs	r0, r0, #32
34180872:	e000 	b.n	34180876 <test_dwt7b+0x2>

34180874 <test_dwt7b>:
34180874:	f3bf 8f6f 	isb	sy
34180878:	f3bf 8f4f 	dsb	sy
3418087c:	f8df 3018 	ldr.w	r3, [pc, #24]	@ 34180898 <.test_dwt7b.dwt_base>
34180880:	6858 	ldr	r0, [r3, #4]
34180882:	bf00 	nop
34180884:	bf00 	nop
34180886:	bf00 	nop
34180888:	bf00 	nop
3418088a:	bf00 	nop
3418088c:	bf00 	nop
3418088e:	bf00 	nop
34180890:	6859 	ldr	r1, [r3, #4]
34180892:	eba1 0000 	sub.w	r0, r1, r0
34180896:	4770 	bx	lr

34180898 <.test_dwt7b.dwt_base>:
34180898:	1000 	asrs	r0, r0, #32
3418089a:	e000 	b.n	3418089e <test_dwt7c+0x2>

3418089c <test_dwt7c>:
3418089c:	f3bf 8f6f 	isb	sy
341808a0:	f3bf 8f4f 	dsb	sy
341808a4:	4b06 	ldr	r3, [pc, #24]	@ (341808c0 <.test_dwt7c.dwt_base>)
341808a6:	6858 	ldr	r0, [r3, #4]
341808a8:	bf00 	nop
341808aa:	bf00 	nop
341808ac:	bf00 	nop
341808ae:	bf00 	nop
341808b0:	bf00 	nop
341808b2:	bf00 	nop
341808b4:	bf00 	nop
341808b6:	6859 	ldr	r1, [r3, #4]
341808b8:	eba1 0000 	sub.w	r0, r1, r0
341808bc:	4770 	bx	lr
	...

341808c0 <.test_dwt7c.dwt_base>:
341808c0:	1000 	asrs	r0, r0, #32
341808c2:	e000 	b.n	341808c6 <test_pmu0+0x2>

What is causing such large delay in test_dwt7 ? (I observe a similar thing with PMU)

1 reply

acapola
acapolaAuthor
Associate II
January 30, 2026

Update after learning about ICB->ACTLR register:

SYS clock frequency = 100 MHz
CPU clock frequency = 100 MHz
ICache: 0, DCache: 0
ICB->ACTLR = 0x0803fcfc
test_dwt0 (1 expected): 1 1 1 1 1 
test_dwt1 (2 expected): 2 2 2 2 2 
test_dwt2 (3 expected): 3 3 3 3 3 
test_dwt3 (4 expected): 4 4 4 4 4 
test_dwt4 (5 expected): 5 5 5 5 5 
test_dwt5 (6 expected): 6 6 6 6 6 
test_dwt6 (7 expected): 7 7 7 7 7 
test_dwt7 (8 expected): 30 30 30 30 30 
test_dwt8 (9 expected): 9 9 9 9 9 
test_dwt7b (8 expected): 8 8 8 8 8 
test_dwt7c (8 expected): 8 8 8 8 8