Visitor II

Question

Does HAL callbacks save r0-r3 registers? Is it needed?

Forum|Forum|3 years ago
March 3, 2023
16 replies
3531 views

Hello,

We have random HardFaults on an STM32H750, maybe after several hours of uptime.

It seems caused by an access to a wrong RAM location, whose address is fetched from the stack and stored in r2.

But the stack seems OK.

This happens in a function frequently interrupted by a higher priority HAL DMA callback.

Inspecting it AFAIK registers r0-r3 are not preserved by GCC optimized at -O2.

It could be that sometimes this interrupt takes place between loading of r2 and its use, overwriting it with a wrong value.

Inside my callback function there are calls to other functions. Is this OK? I read somewhere that GCC saves only the registers it uses in the main function of the ISR and doesn't take care of register's use in called function. Don't know if it makes sense.

Adding __attribute__((interrupt)) does not seem to make any difference.

For now I added push and pop of the scratch registers in the callback and it seems to work, but I'm not completely sure because a slight timing difference could be enough to mask the problem.

It is also inconvenient, because the HAL library needs to be patched.

I am not convinced of anything I wrote before because it would break most code and it would have been spotted long ago.

Any comment?

Thanks and regards.

Alberto

This topic has been closed for replies.

Show previous replies

W

waclawek.jan

Super User

> the interrupt could change the value of R2

As hardware stores/restores R2 at interrupt entry/exit, the only way how this could happen would be if the interrupt would errorneously write to the stack. The likelihood of this is lower than a zillion of other causes, first of which is straightforward user bug.

While r2 might've been loaded at 0x0800ffd8, there's an unconditional jump at 0x0800ffee, so there's some code executed until a jump to 0x0800fff0, and we don't see that code. It may or may not modify r2, directly or indirectly.

JW

A

AlbertoGarlassiAuthor

Visitor II

I placed two breakpoints at the lines below the unconditional jump and they are never hit, even when a hard fault occurs. Don't now how to interpret this, it seems dead code and I don't know about its likelihood. it comes precompiled from ST's CMSIS.

Meanwhile I'm trying to inspect the stack at top priority ISR entry and exit.

/******************************************************************************/
/* STM32H7xx Peripheral Interrupt Handlers */
/* Add here the Interrupt Handlers for the used peripherals. */
/* For the available peripheral interrupt handler names, */
/* please refer to the startup file (startup_stm32h7xx.s). */
/******************************************************************************/
 
/**
 * @brief This function handles DMA1 stream0 global interrupt.
 */
void DMA1_Stream0_IRQHandler(void)
{
 /* USER CODE BEGIN DMA1_Stream0_IRQn 0 */
 
//using global variables for instrumenting, defined as
// volatile uint32_t StackInR2, StackOutR2
 
//magic number 6 is needed because there are some stack pushes before reaching this line
	register char * stack_ptr asm("sp"); 
	StackInR2 = * ( (uint32_t *) stack_ptr + 6); 
 /* USER CODE END DMA1_Stream0_IRQn 0 */
 
 HAL_DMA_IRQHandler(&hdma_adc1);
 
 /* USER CODE BEGIN DMA1_Stream0_IRQn 1 */
	 StackOutR2 = * ((uint32_t *) stack_ptr + 6); 
	if (StackInR2 != StackOutR2)
	{
//this is a placeholder for inserting a breakpoint
		StackInR2 = StackInR2; 
	}
 /* USER CODE END DMA1_Stream0_IRQn 1 */
}

I'm clearly hallucinating because on first execution the stack is correctly loaded with r0-r3, but, later on, these stack entries are never updated. Debugger shows that sp stays the same and r0-r3 change, for subsequent executions. I would expect those locations to follow r0-r3, but the debugger too shows this is not happening.

Dcache has been disabled.

I'm checking tail chaining, never dealt with it but from what I read this feature could skip registers save. Can't understand how this could apply to this case.

There are some messages from other poor fellows that have experienced hard faults with CMSIS fft.

W

waclawek.jan

Super User

My bad, I overlooked that the unconditional jump just jumps through those two "instructions" - and they are not instructions, they form one literal word (i.e. a constant read by the program somewhere, not executable instructions). That's why breakpoints won't work there.

Looked at the library and this is beginning of a loop, so there's a jump to the same point from a point forward to this code. And it gets incremented by r3 meantime. This still does not explain the r2 corruption.

Try to run until the hardfault, and show us content of registers and stack as it is in the hardfault, without walking back in the debugger.

JW

T

Tesla DeLorean

Graduate II

Tail chaining is where it maintains a.dirty register context across multiple IRQ handlers which have NO expectations on initial register values at entry.

When all pending interrupts are cleared the register context is restored and the processor returns to where it left off.

A

AlbertoGarlassiAuthor

Visitor II

After many tests and reasoning nothing is clear.

We ended up including the relevant CMSIS code in our project, don't link the precompiled lib.

No hard faults anymore, but this could be the result of moving things around but the root cause is still there.

Anyway that's the best we could do for now.

Every time we had a hard fault it was caused by line 182 of arm_cfft_radix8_f32.c

Hard to believe that a timing issue or stack corruption or whatever always kicks in exactly at that line.

I will report if there are any news

Thanks to everybody.

Alberto

P

Piranha

Graduate II

Are you using RTOS, particularly the FreeRTOS? The V10.5.0 recently fixed a pretty nasty bug.

What about a potential stack overflows?

Otherwise it seems that the issue also could be related to ABI settings for a compiler, runtime, floating point or some other library.

A

AlbertoGarlassiAuthor

Visitor II

No, we don't use RTOS. Thanks for pointing out this bug, I will check if somehow it could be relevant.

It looks it is not a stack overflow. The stack and SP register seems to be OK after the hard fault. But there could be something that the debugger is hiding, like a cache coherence issue. DMA should not use the RAM potentially involved, but who knows.

I'm writing a separate project to exercise the FFT routine in CMSIS.

Regards

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded