Skip to main content
Explorer II
May 9, 2025
Question

STM32: avoid deadloops when intercepting (expected) ECC errors while reading flash memory

  • May 9, 2025
  • 8 replies
  • 1255 views

I'm using STM32 with HAL and LL drivers (H7 and G4 families, in particular, but I think this can be a general question) and I trying to avoid being forced in recursive faults when reading a flash location with broken ECC.

 

In my application it may happen to try to read a broken flash location.

In TrapHandler I'm able to intercept the error, report it to flash driver, clear error flags, and avoid any reset.

However, when returning from TrapHandler, I'll fall back on the same flash instruction which generated the fault, which will try to read the same location again, and so another fault will be generated in loop.

Is there a way to continue with the execution in a portable way after encountering this fault?

 

For a deeper understanding, this is one of the specific use case in which I would need the above behavior:

When in bootloader, before jumping to application, I calculate the CRC of application flash, compare it with the one stored to another flash location, and jump to application only if they match.

However it may happen that a flash location is broken (e.g. due to an error in the application, or to a sudden power loss while writing/erasing) generating an ECC error.

When encountering this error, I report the faults to the flash driver which results in a failed CRC calculation, and I would like to proceed with my algorithm.

However, even if I'm correctly detecting the error, clearing flags and returning from fault handler, the flash driver will try to read the broken location again with another fault immediatly generated, and I get stuck on the same memcpy instruction forever, without being able to proceed.

 

This is a simplified code for the flash read, with the Fls_SetEccError called inside TrapHandler to report errors to the flash driver. However, if an ECC error is encountered, I'll get stuck forever in the memcpy operation.

To break the flash ECC, I can perform two writes with different values on the same flash location and then try to read it.

bool_t Fls_bEccError = FALSE;

bool_t Fls_Read( uint32_t u32StartAddress, uint32_t u32Length, uint8_t *pu8Buffer ) {
memcpy( pu8Buffer, ( uint32_t * )u32StartAddress, u32Length );
return !(Fls_bNoEccError);
}

void Fls_SetEccError( void ) {

Fls_bEccError = TRUE;

}

Thank you in advance for your help

    This topic has been closed for replies.

    8 replies

    Graduate II
    May 9, 2025

    Not sure there's an easy answer. You can check the stack frame to recover the context and advance the instruction pointer based on the opcode. Or recognize it is in this particular loop.

    Super User
    May 9, 2025

    Consider resetting instead and using a magic value in memory which indicates an error was encountered during a read to flash location X.

    Graduate II
    May 9, 2025

    Not sure there's an SEH (Structured Exception Handler) in the try/catch sense.

    The stated issue is that the return simply retries the operation, hoping the handler has fixed the issue, say pulling in virtual memory, or it digs into the context to fix or emulate the opcode that faulted, and advance. Making a general handler would be quite a task, making something relatively selective/specific, perhaps not so.

    In that loop you could have it jump to a different location upon return, say breaking from the loop, by modifying the PC in the stacked context.

    You have to return, rather than have the handler jump directly, as the machine has to unstack the context and MCU/NVIC internal states.

    Super User
    May 9, 2025

    Since you haven't cured the ECC error (erase & rewrite the whole sector) the error condition remains. As Tesla wrote there's no easy answer. For one, you can mask the memory fault permanently and resume running, leaving the proper fix for latter. 

    Super User
    May 9, 2025

    If you use asm to read the FLASH (and at least partially also for the ISR) you know which register contains the offending address, so you can increment it or set to a safe value (directly modifying given register or modifying it on stack, depending on which register is it) before returning.

    JW 

    MrJorgeAuthor
    Explorer II
    May 14, 2025
    My target would be to avoid a reset in presence of an ECC error during a Fls_Read, which should only produce E_NOT_OK as result of the corrupted memory location (result which I was able to obtain on other MCUs without complex trap management and stack operations).
     
    I give you a small update based on my reasearch (see the code below):
    I found a way to avoid the reset and detect errors: disable the faults before the flash access and enable them again after that, while checking flash peripheral status registers to detect errors. However I do not like this approach since I would be scared to loose other possibly dangerous faults (and because, if my understanding is correct, it would also disable interrupts during the operation).
    Std_eReturn_t Fls_Read( uint32_t u32StartAddress, uint32_t u32Length, uint8_t *pu8Buffer )
    {
     Std_eReturn_t eRet = E_OK;
    
     /* Flash read simply copy bytes from flash to buffer */
    
     // Set FAULTMASK = 1: disables all faults except HardFault and NMI
     __set_FAULTMASK( 1 );
    
     // Set BFHFNMIGN: ignore BusFaults during FAULTMASK, NMI, HardFault
     SCB->CCR |= SCB_CCR_BFHFNMIGN_Msk;
    
     memcpy( pu8Buffer, ( uint32_t * )u32StartAddress, u32Length );
    
    #if 0
     // Check for ECC double-bit error, and set E_NOT_OK if found
     errors are reported on SR1/2 and on ECC_FA1/2
    #endif
    
     // Clear BFHFNMIGN bit
     SCB->CCR &= ~( SCB_CCR_BFHFNMIGN_Msk );
    
     // Clear FAULTMASK = 0
     __set_FAULTMASK( 0 );
    
     return eRet;
    }
    A suitable alternative, which seems far less dangerous, would be to insert a label just after the memcpy operation in the Fls driver, and inside TrapHandel modify the return point after fault managment so that the application software restarts from that label (and not from the memcpy operation) if the fault is recognized as belonging to Fls_Read.
     
    However I was not able to update the next instruction executed inside the Trap Handler, so that program counter restarts from a different point after the fault managment.
    If any of you would know how to accomplish this task would be very helpful.
     
    Thank you again for your time

    Edited to apply code formatting - please see How to insert source code for future reference.
    Super User
    May 17, 2025

    Interesting, I wasn't aware of existence of SCB_CCR.BFHFNMIGN feature/bit. Thanks.

    And your NMI handler was just a return, then?

     

    A suitable alternative, which seems far less dangerous, would be to insert a label just after the memcpy operation in the Fls driver, and inside TrapHandel modify the return point after fault managment so that the application software restarts from that label (and not from the memcpy operation) if the fault is recognized as belonging to Fls_Read.

    As I've said above (although I've suggested something slightly different, just avoiding the repeated reads (and possibly setting some flag), doing this sort of things implies to use asm, as it's straight against the grain of any higher-level language.

    JW

    Super User
    May 17, 2025

     as it's straight against the grain of any higher-level language.

    Maybe this can be implemented on the C level using setjmp/longjmp (exit the NMI exception to a trampoline that will call longjmp)

    Super User
    May 18, 2025

    There was another thread recently where the user said that after exiting NMI, the program jumped to the code after the command which produced the ECC error. I wonder which is correct, or what the difference in results is attributed to.

    Super User
    May 18, 2025

    IIRC this depends on the phase of execution of the instruction. Some instructions are marked as "done" and IP forwards to the next instruction. In some other cases the IP stays on the offending instruction, so return from exception then will repeat the instruction. I never could remember the rules. Jumping to explicit address instead of simply resuming from the stacked IP avoids the guesswork.

     

    Super User
    June 7, 2025

    @MrJorge 

    Show us the original NMI handler, which you said returns to the failing instruction.

    JW