Skip to main content
Graduate
March 30, 2024
Solved

Random hardfault on STM32F4

  • March 30, 2024
  • 2 replies
  • 2837 views

Hello,

I have been struggling to make this post because I wanted to post a very clear question after I had found the source of the problem. However, I have been working on this problem nightly for two weeks now and even still I can't pinpoint precisely what / where my program is going wrong. However, no matter what I seem to do, trying to get my STM32F4 to do anything useful will cause a fault anywhere between 5 seconds to 5 minutes later.

Code:

https://github.com/DerekSavage1/Word-Clock-Rev-3

Hardware:

- STM32F411CE on custom circuit board (can provide schematics)

- 24 Mhz external crystal

- 32.768kHz external RTC crystal

The steps that I have taken:
- Commented out sections of code until it worked. The program will only work if the while loop is empty or only declares a variable. 

- Converted most arrays to switch statements to avoid memory errors

- Enabled all warnings with -WPedantic
- Rewrote the matrix logic in a file on my computer without HAL calls and checked with all warning flags and ASan

- Stepped through the code in debug mode. Never found the source of the crashes as it takes multiple loops to cause a fault.

- Ordered a STM32F4 on amazon to see if it will hard fault on a differently designed board. It will come in within a few days.

- Increased stack size from 0x400 to 0x800, and 0x1200, and 0x10000. Same issue
- Always looked at fault analyzer and stack trace. They almost always look like this:

#0 HardFault_Handler () at ../Core/Src/stm32f4xx_it.c:87

#1 <signal handler called>

#2 0x00000000 in ?? ()

#3 0x08001324 in activateDigit (digit=113 'q') at ../Drivers/Numeric_Display/Numeric_Display.c:32

Backtrace stopped: previous frame inner to this frame (corrupt stack?)


Some nights I would get an idea. What if its because of X or I haven't looked at Y.
However, last night I had only one thing in my loop: A function call. I had tested the function on my own computer with no errors. I try again today and the function appeared to work fine and would only fault if the function is called in conjunction with HAL_RTC_GetTime() and getDate().

 

While it is possible that I could strip out more code to illustrate a minimum viable fault, this is an example of how little is in my main function:

 

int main(void)
{
 HAL_Init();
 SystemClock_Config();

 MX_GPIO_Init();
 MX_DMA_Init();
 MX_TIM1_Init();
 MX_RTC_Init();
 MX_TIM3_Init();

 HAL_TIM_Encoder_Start(&htim3, TIM_CHANNEL_ALL); // Start the encoder interface

 while (1)
 {
	HAL_RTC_GetTime(&hrtc, &sTime, RTC_FORMAT_BIN);
	HAL_RTC_GetDate(&hrtc, &sDate, RTC_FORMAT_BIN);

	displayTime(sTime.Hours, sTime.Minutes, color, brightness);
//	DMA_Send(&htim1);
 }
}

 

Even though I had tested displayTime() on my machine I decided to comment out most of the function and it would still fault:

 

displayTime(uint8_t sTime.Hours, uint8_t sTime.Minutes, uint32_t color, uint8_t brightness) {
 color = 0x404040;
 //all else is commented out
}

 

 

The reason I have been saying "fault" instead of specifying which type of fault is because that, too, is different each time. Invalid instruction, stack error, etc. I can get a list of them if needed.

Is it possible that I have configured something incorrectly or there is an error in my pre-generated code that is creating a memory error? My guess is that the code has been stomped on by an initialization which causes actions like reading encoder values, function calls, and RTC calls to go wrong.

 

I have an oscilloscope if you would like me to probe something.

 

Any help would be greatly appreciated.

    This topic has been closed for replies.
    Best answer by AScha.3

    VCAP  should be 4,7u.

     

    AScha3_0-1711832962446.png

     

    2 replies

    Technical Moderator
    March 30, 2024

    Dear @DerekSavage ,

     

     

    May  be I overlooked the GitHub files , can you please share the schematics and PCB , in particular power pins / VCAP and associated capacitors and crystal datasheet : 32KHz. I see the system clock is set to PLL using HSI and not HSE . 

    Cheers,

    ST1

     

    Graduate
    March 30, 2024

    Yes I can. I am at work at the moment, but I will be home in a few hours and will share them.

    Super User
    March 30, 2024

    Hi,

    just you didnt tell: can you make a small loop, toggle an output ? (with LED or look with scope)

    So simple, small program running fine ?  (for hours ?)

    Just to be sure, its not a hardware problem, like spikes on supply...

    If ok, your problem seem to be the LSE clock. 

    Try: set using LSI , see if it changes anything.

    - leave out any RTC or LSE things . To test , it has to do with this.