Skip to main content
Graduate
August 14, 2024
Solved

How to diagnose a Hard Fault Exception on STM32F407IGT

  • August 14, 2024
  • 5 replies
  • 3564 views

Hello

After running the code for about one to two hours I always get Hard Fault exception. Readout of the registers in this Hard Fault while loop are:

HFSR=0x4000 0000

CFSR=0x8200

BFAR=0x20020000

MMFAR=0x20020000

AFSR=0

 

Readout of the SP register shows:

SP=0x2001ff40

*(SP)=8

*(SP-1)=8

*(SP-2)=1

*(SP-3)=2

*(SP-4)=2

*(SP-5)=2

*(SP-6)=0

*(SP-7)=0

*(SP-8)=0x2001ffc0

*(SP-9)=0x8012ae8

What is going on here? How to make a proper recovery from this situation?

 

    This topic has been closed for replies.
    Best answer by SMali.3

    Problem with this exception was solved.

    The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.

     

    5 replies

    Graduate II
    August 14, 2024

    KB: How to debug a HardFault on an Arm Cortex®-M STM32 

    https://interrupt.memfault.com/blog/cortex-m-hardfault-debug

     

    You can use CubeIDE integrated hard fault analyzer to get a friendlier view of state.

    You can use CubeIDE build analyzer to find which function lives at certain address (this doesn't require an active debug session, unlike disassembly view).

     

     

    Possibly (If I've decoded the data correctly), you have a divide-by-zero error occurring at 0x8012ae8.

    SMali.3Author
    Graduate
    August 14, 2024

    Thanks for the fast reply.

    I do not use CubeIDE for this project, I use Atollic TrueSTUDIO.

    How did you get to idea that it is a divide-by-zero problem?

    I mean:

    HFSR=0x4000 0000 -> I have a FORCED hard fault

    CFSR=0x0000 8200 -> PRECIS ERR and BFAR VALID which means the address in BFAR is valid

    BFAR=0x20020000

    I assume there was and access to this location presumably a read. In my linker .ld file I have: _estack = 0x20020000

    Does this have some connection in some ways?

    Also I do not have any code on address 0x8012ae8. My code according to .list file and settings in the ld file starts at 0x08020000.

    Graduate II
    August 14, 2024

    How did you get to idea that it is a divide-by-zero problem?

    My Mistake. I searched for CM4 CFSR bits definition but got the CM3 page instead. 

    Super User
    August 14, 2024

    True Studio has the fault analyzer, same as in CubeIDE. [video]

    I do not have any code on address 0x8012ae8. 

    This likely is the culprit. Stack overwrite?

    Graduate II
    August 14, 2024

    Isn't your stack dump showing the wrong addresses? The stack (in Cortex-M4) grows downwards. If you want to see what was pushed on the stack by the exception (esp. the PC), you should be looking at SP+n not at SP-n . That's why the only value that looks like a code address doesn't make sense (PC should be available at *((uint32_t*)SP)+6 ) unless I'm wrong again).

     

    That's why it's simpler to just make use of the Hard fault analyzer / GUI debugger, avoiding all these easy-to-make mistakes.

    SMali.3Author
    Graduate
    August 14, 2024

    Ok. I made it wrongly. Instead of incrementing decrementing. I will correct that in my code.

    Yes, I will proceed, when debugging this problem, with fault analyzer. I did not even know that such tool exists. Thanks to you all sharing this with me.

    I will be able to work on the system on Friday and I hope I will have more information about this exception.

    Super User
    August 14, 2024

    Looking at the call stack when the error happens can give you insight. If it's a stack overflow. If stack variables are corrupted, likely there's an out of bounds write that is at fault.

    Does you code do dynamic memory allocation? (malloc/free)

    SMali.3Author
    Graduate
    August 14, 2024

    I make some allocation of small amount of memory at the initialization stage with malloc which is never released.

    SMali.3AuthorAnswer
    Graduate
    October 2, 2024

    Problem with this exception was solved.

    The cause was a DC/DC converter in the near proximity of the board with this microcontroller because of EMC interference. After replacing the DC/DC converter with other one the problem was gone.

     

    Graduate II
    October 2, 2024

    Are you sure it was EMI?  excessive switching noise on the SMPS output could also cause glitches for example.

    SMali.3Author
    Graduate
    October 2, 2024

    I assume that is EMI because the power supply in question had no direct connection with the micro-controller except for the ground. Micro-controller is supplied from another SMPS which works fine with the component for more than a decade.

    I did not measure with the spectrum analyzer because of lack of time. Maybe I will do that at some point in the future.

    Graduate II
    October 2, 2024

    Fair. Would have been interesting to verify by shielding with an improvised can and seeing if the issue went away.