Skip to main content
Graduate
October 30, 2023
Question

EEPROM emulator corrupts STM32G431KBU6 program

  • October 30, 2023
  • 11 replies
  • 8502 views

Hi all,

We deployed a lot of devices that have a STM32G431KBU6 controller.
It basically has a foot switch to turn on/off a 12v DC motor and a rotary switch to control the speed.

We use EEPROM emulator to save the speed setting.
If speed setting is different from the stored value and there is no new speed change detected within a minute we store the new value. (to safe unnecessary writes)

Clients mostly only set the speed setting once when they first use the product.

So EEPROM emulated write is almost never done only EEPROM emulated read on start up of the device

Somehow support receives a lot of complains about non responding devices. Only reprogram the device helps. But this is a temporarily fix.

Sometimes clients tell us they had a power outage before these issues appear.

We can not reproduce the issue, maybe because our mains power is very clean?

Can power glitches (12v power jack) cause flash program corruption?
  

 

 

 

 

 

    This topic has been closed for replies.

    11 replies

    Graduate
    October 30, 2023

    The eeprom emulator writes into pages of FLASH memory. And your program is also stored in FLASH memory.

    What have you done to ensure that the pages of FLASH used for EEPROM-emulation are not pages where the compiler/linker places your program?

    You probably need to alter the memory-map from the default one for your processor to set aside at least two pages of FLASH for EEPROM emulation, and alter the source-code of the EEPROM emulation to use those pages.

    WSpar.1Author
    Graduate
    October 31, 2023

    I instructed the assembly team to program our code to address 0x08000000

    At startup I read in our values:

    //Small delay for potential discharge before unlocking flash
     HAL_Delay(600);
    
     /* Unlock the Flash Program Erase controller */
     HAL_FLASH_Unlock();
    
     ee_status = EE_Init(EE_CONDITIONAL_ERASE);
     if(ee_status != EE_OK) {Error_Handler();}
    
     ee_status = EE_ReadVariable32bits(1, &EEPROM_MotorSpeedManual);
     ee_status = EE_ReadVariable32bits(2, &EEPROM_MotorSpeed);
     ee_status = EE_ReadVariable32bits(3, &EEPROM_MotorDirection);

    Later on, when a setting change is detected, I write the new inputs after one minute

    if(g_SettingsChanged == true && HAL_GetTick()-lastSettingsChangedTime > 60000)
    	{
    		HAL_FLASH_Unlock();
    		EE_WriteVariable32bits(1, S_MotorSpeedManual);
    		EE_WriteVariable32bits(2, S_MotorSpeed);
    		EE_WriteVariable32bits(3, S_MotorDirection);
    		HAL_FLASH_Lock();
    
    		g_SettingsChanged = false;
    		lastSettingsChangedTime = HAL_GetTick();
    	}

     

    Where can I verify where EEPROM values are written to?

    Graduate II
    October 31, 2023

    How does dropping into Error_Handler() fix anything? Won't that just die silently in a while(1) loop by default?

    Perhaps you need a better strategy? Like addressing the issue, erasing the section of FLASH being used, and write usable defaults? Or whatever is needed for it to initialize the memory so EE_Init() succeeds.

    If the device had a battery presumably NVRAM/BKP would be a better place to store things temporarily?

    Perhaps you could learn to use FLASH directly, and journal your write, and check them, etc?

    Super User
    October 30, 2023

    Reads from flash are unlikely to cause issues even if power is unstable. If power drops below the BOR threshold, chip will reset and all is well.

    Have you recovered a "nonfunctioning" device and examined the flash on it?

    Writes could certainly cause issues, but it doesn't sound like that is occurring.

    WSpar.1Author
    Graduate
    October 31, 2023

    As you can see in my code I unlock flash already at startup, I read the EEPROM values if there are. Otherwise I write the default values and lock flash.

    Maybe I can improve this by already store the default in EEPROM when programming the device? Not sure how.

    To avoid recalling a lot of devices my client wants a self repairing mechanism.
    Some form of factory reset/repair options. 
    For example have a backup of working firmware that the chip can reprogram it self with.

    Is that possible? The program can fit the microcontroller multiple times. Hopefully ST has a bootloader/mechanism for this.

     

    Super User
    October 31, 2023

    I seem to recall previous posts on this forum that mention that the ST EEPROM emulation code ALWAYS writes to the EEPROM flash area on startup.  It writes to the first "unused" page (or 64-bit row).  This was to handle the case where a previous write operation may have been interrupted by a power failure or reset.  If I remember correctly, if the flash was partially programmed yet reads back as 0xffff, a 2nd attempt to program it may fail.  This "always write the next block on startup" was a way to ensure that writes from user code would always write to "good" flash.  You can check the code to see if that really is the case.

    You do not need (and SHOULD NOT) unlock the flash on startup.  Only unlock it right before you write to flash - and the EEPROM emulation code should handle that internally.  You shouldn't need to.

    There are #defines for which FLASH pages are used.  See AN4894, section 5.1.3, "User Defines" (at least in the version I have).  But as @TDK mentioned, as long as the FLASH used by the EEPROM emulation does not overlap the FLASH used for code, none of that should corrupt your code.

    I don't recall is the G431 has dual flash banks.  If it does, you should enable dual bank mode and make sure the EEPROM emulation is in the 2nd bank and your code in the first bank (which it is if you load code at 0x08000000).

    Graduate II
    October 31, 2023

    Products that die silently in Error_Handler() will look like bricks to a consumer, and be very hard to recover via any kind of user interface.

    For things I need to allow for user configuration in a semi-permanent fashion, I have workable defaults, I recover settings from FLASH, and manage the write, erase, journalling of the configuration structure directly. The structure has integrity checking so you know if it wrote completely/correctly.

    The EEPROM emulation is a black-box of halfassery ..

    WSpar.1Author
    Graduate
    October 31, 2023

    You mean you are not using EEPROM emulation at all?
    And have workable defaults on every reboot?

    In my case I always need custom set values on reboot

    WSpar.1Author
    Graduate
    October 31, 2023

    In my code, I found the following EEPROM config

    /* Configuration of eeprom emulation in flash, can be custom */
    #define START_PAGE_ADDRESS 0x0801B000U /*!< Start address of the 1st page in flash, for EEPROM emulation */
    #define CYCLES_NUMBER 1U /*!< Number of 10Kcycles requested, minimum 1 for 10Kcycles (default),
     for instance 10 to reach 100Kcycles. This factor will increase
     pages number */
    #define GUARD_PAGES_NUMBER 2U /*!< Number of guard pages avoiding frequent transfers (must be multiple of 2): 0,2,4.. */
    
    /* Configuration of crc calculation for eeprom emulation in flash */
    #define CRC_POLYNOMIAL_LENGTH LL_CRC_POLYLENGTH_16B /* CRC polynomial lenght 16 bits */
    #define CRC_POLYNOMIAL_VALUE 0x8005U /* Polynomial to use for CRC calculation */
    Super User
    November 1, 2023

    > You mean you are not using EEPROM emulation at all?

    Putting words in @Tesla DeLorean mouth - he probably doesn't use ST's EEPROM emulation, but rather his own custom version.  That is what I do.

    **IF** you can guarantee that you have stable power and operating conditions when the motor config data is written to FLASH (the ubiquitous "updating, do not remove power" kind of situation, and code that won't reboot on FLASH errors), you don't need the ST EEPROM code and its side effects.(writing to FLASH on every startup). Pick a page of FLASH and write the data (with checksums/CRCs/etc. for validation).

    WSpar.1Author
    Graduate
    November 1, 2023

    Well 7 out of 12 devices got corrupted somehow after a power outage.
    I still have to receive one device so I can diff the content.
    But all devices are repairable with a reprogram.

    My PCB ground is connected to the metal housing.

    Maybe use brown-out detector to block Flash_unlock() from executing when power is unstable?

    WSpar.1Author
    Graduate
    November 1, 2023

    Ok found something interesting today.
    STMCubeProgrammer doesn't erase all sectors when programming



    18:16:52 : Address : 0x08000000
    18:16:52 : Erasing memory corresponding to segment 0:
    18:16:52 : Erasing internal memory sectors [0 52]
    18:16:53 : Download in Progress:

    I seperated my EEPROM pages a bit further away to 0x0801C000 (btw what will be a good way to determine end address of my program?) 
    After programming, all EEPROM writes are still there.
    So EE_Init() doesn't clear anything?
    It is expected behavior that every write is on a new line?

    Since I got these chips via brokers, I don't trust the content of these chips.
    The programmer only erases the sectors that are going to be programmed for my program? 

    Graduate II
    November 1, 2023

    End of the program, or end of the image? The linker packs statics at the back of the code/text section.

    You should be able to find or place a symbol via the Linker Script, or create a suitably aligned or placed MEMORY area that's distinct from the area the linker can put code/data in.

    End of Image is typically _sidata + (_edata - _sdata)

    In Keil there are some $$LIMIT$$ or $$SIZE$$ type symbols

    Super User
    November 2, 2023

    If they were corrupted after a power outage, is it possible that power flickered (or glitched) on and off as power went out or came back on?  If so, the corruption could be from the EE_Init write to FLASH that I mentioned before, i.e. power glitches on just long enough for your code to get to the write operation in EE_Init then power drops and corrupts that write.

    If you are able to read out the "corrupted" FLASH contents, it will be interesting to see if the corruption was in code space or your EEPROM data space (could corrupted config data make it look like your program was corrupted?).

    WSpar.1Author
    Graduate
    November 2, 2023

    Yes this approach has priority.

    Can program space also get corrupted with power glitches while Flash is locked?
    Is there a possibility to only unlock the EEPROM pages and keep program pages locked?

    I'm running a stress test that is writing eeprom values for more then 24 hours without any issues. So power glitches or ESD discharge seem more likely.

    Firmware improvements I can think of now are:
    * Don't use Flash_unlock() high in my main()
    * Unlock/Lock direct before and after the write.
    * Detect setting changes and only write and unlock Flash after 1 minute of no new incoming changes and detect stable voltages. (BOR detection?)

    * Use EE_Init(EE_FORCED_ERASE); instead of EE_Init(EE_CONDITIONAL_ERASE);
    * Proper handle any errors that can occur. (how can I simulate these?)

     

    My client wants a self repairing mechanism to avoid so many support tickets.
    But since it is a single bank flash I think it is really hard to have the program multiple times in flash and build some form of in-app-programmer that CRC checks the program and repair it.
    Probably this is getting very complex with Interrupt vectors etc.

    Maybe I can convert the USB-mini port to an OTG one and program from a flash drive.
    Does such a functionality exist or even fit in the bootloader space?

     

     

     

    Super User
    November 2, 2023

    > Unlock/Lock direct before and after the write.

    Your code should NEVER unlock the flash.  The EEPROM emulation calls should handle that (I think).

    > Detect setting changes and only write and unlock Flash ...... detect stable voltages. (BOR detection)

    This is useless.  The only way you can guarantee stable voltages for the duration of the FLASH erase/write cycle is to have an EXTERNAL brown-out detector (not the on-chip one) and enough bulk capacitance to provide power to the CPU after input power goes away so that the FLASH operations can complete.  Once the CPU starts a FLASH write operation, there is nothing you can do in software to prevent flakey power from interfering/corrupting the data.

    And once again, I don't think your "store config data" operation is what is corrupting the FLASH.  That only happens once, right?  I think the issue is the EEPROM emulation init code that ON EVERY SINGLE POWER ON writes to FLASH.

    FLASH should be stable (good/valid) regardless of good/bad power if code never attempts to write to it.  It is only when FLASH write operations happen with "bad" power that things can get corrupted.

    And if it is your code that is getting corrupted, then there is nothing you can do.  Period.  Except maybe provide a way for the customer to force the internal boot loader to run and re-program the chip from scratch (I wouldn't want MY customers doing that).  If it is only the configuration data that gets corrupted, then as @Tesla DeLorean said you need to have some kind of default/fall-back config that will allow the system to run, and then allow the customer to re-program their specific configuration.

    Graduate II
    November 2, 2023

    @WSpar.1 wrote:

    It basically has a foot switch to turn on/off


    Here is your power outage, how you prevent user switch off on same moment as EEWrite?
    Your design require redesign, but start with locate what in flash is corrupted. 

    WSpar.1Author
    Graduate
    December 4, 2023

    Hi all,

    I got one bricked device back from the client and saved both the entire flash and my EEPROM pages.
    Attached is the EEPROM export starting from 0x0801b000 to the end of the flash region.
    I noticed at 0x081b800 it restarts the headers, or is this a guard page?

    WSpar1_0-1701710846417.png

    I can now inject the hex dump into fresh programmed devices and reproduce the bricking.
    I can not reproduce how the EEPROM pages got corrupted, but at least I can now start making code that should be able to repair this.

    Can someone here have a look at the file?
    Should EE_init be able to handle this?

    Another issue I face now is that Debug inside STM32CubeIDE is probably erasing the entire flash?
    Because in debug mode everything seems to work again.
    It is only when injecting the hex file with STM32CubeProgrammer that I can brick a device.

    But I like to verify where it is hanging in the code.

     

     

     

    Graduate II
    December 4, 2023

    Is right say your brick is jump to errorhandler after EE_Init ?

    From hex isnt clear what is fail.

    IDE or Programmer can be set to erase full or only required part of flash.

    Debug can be started withoud flash erase, set it in debug config.

    Trace EE_Init for error source after stored hex dump reflash...

    WSpar.1Author
    Graduate
    December 8, 2023

    So far not found an option how I can skip programming when start debugging in STM32CubeIDE.

    Also an error in EE_init should turn on a red led, that led is never reported turned on.
    Could it be NMI / EECD on the G4 series?

    It is really difficult to find out what is going on here.
    I read EEPROM emulation documentation multiple times, I see some points I can improve on, but are not linkable to the real issue in my opinion.

    The metal footpadel on the device has a TVS diode, but other then that is going straight to input pin on the uC.
    Maybe I have ESD issues here?

    For now I try to collect bricked devices and make full flash dumps of them to figure out what is going on here.

    If I look at the ST G4 eeprom example, it is basically implemented the same as what I have now:

    A Flash_unlock() high in the main loop and unlocked for a long time

    And error handling:

    static void Error_Handler(void)
    {
     while(1)
     {
     /* Toggle LED_KO (Red) fast */
     BSP_LED_Toggle(LED_KO);
     HAL_Delay(40);
     }
    }

    So the examples are not very great.
    Does someone know a very good implementation example?