Skip to main content
Visitor II
May 25, 2022
Question

Severe Reset issues with freeRTOS

  • May 25, 2022
  • 6 replies
  • 5611 views

This was a ticket I opened this year regarding a fatal flaw in FREE RTOS which I didn't get a final answer or proper solution. Even so the team that I've been working with manage to get a workaround that for now has been shown to be very effective, but we need a permanent solution. Since if a watchdog is running and many programmers don't know about this issue, the device might stall or brick until a full power cycle is done or until it's reprogrammed again if an upgrade (Firmware over the air for example) was in progress.

Here's the post, and for the ST team here's the case number 00153229

"Greetings,

Currently we are developing the firmware of medical device using this microcontroller (STM32L4S9AI) and we are currently having the following issue, 50% of the times we reset the core, the device doesn't boot, we found that it's getting stalled on the pendingSV related functions.

Since the hardware is from a client and there's a non disclosure agreement, we are limited to the information we can provide.

But here's the current information about the configuration that I can provide without prejudice:

  • internal flash is divided in 3 sections, Pre-bootloader, Bootloader and Main Firmware
  • FreeRTOS is being used (CMSIS v2)
  • Low Power timer is configured for low power modes (stop mode 2)

This problem started once we start using the stop modes and other hardware features, but we realised that it was just a coincidence, and further testing shows that if the boot is successful, the next time we reset the watch it will stall once "xPortPendSVHandler" is called. The first two firmwares run perfectly but once the transition to the third firmware occurs, it stalls.

We manage to make a first work around that actually works, but we are afraid that might be a temporary solution as the program grows.

One fix is, for the standby and reboot we de-initialize all hardware and make a reset (works 100% of the times) other is clear the ram memory in the pre-bootloader, but its not working properly all the time.

Thanks.

Best regards,

André Pereira"

Here's one of my answers with an important detail:

"Greetigns Mr. ########,

In a powerup cycle everything works fine because all the memory is being written from scratch, but when resetting, part of the ram memory for some reason persists causing hardfault when it calls pending_SV because it has a register that is active/true/enabled it for some reason during the boot. We also suspect there's also a ghost process running that triggers the Hardfault."

    This topic has been closed for replies.

    6 replies

    Graduate II
    May 25, 2022

    Not sure I learned much from reading that.

    Team of people spend six months unsuccessful debugging their project..​

    APereiraAuthor
    Visitor II
    May 25, 2022

    The part where this was failing isn't caught simply by debugging and happens during the third part, and we didn't dedicate 6 months trying to solve this. We solved it but as mentioned, it's a workaround and for all users, it is required a permanent answer and not a workaround developed by us, the client.

    Graduate II
    May 25, 2022

    Ok, but unless I missed some major plot points here, you never determined what the actual cause was, and it was attributed to some "ghost" process? Not sure how that's going to pass muster in a certification report.

    And this for a system where you have access to all the source code?

    Unless you can actually attribute it to a hardware failure/short coming in the IC, I'm not sure why ST would get deeply involved in debugging.

    What compiler/tool chain are you using? GNU/GCC based?

    Any other stages of the loader using the RTOS, or HAL/MX initialization?

    Super User
    May 25, 2022

    As @Community member​, I don't see anything in that which would allow anyone here to say what's going on.

    @André Pereira​ "a fatal flaw in FREE RTOS"

    FreeRTOS is an independent 3rd-party product - nothing to do with ST.

    So, if you've identified a flaw in FreeRTOS, you need to report that to them:

    https://www.freertos.org/RTOS-contact-and-support.html

    "when resetting, part of the ram memory for some reason persists"

    If power is retained, then all of the RAM will persist - that is to be expected.

    "suspect there's also a ghost process running"

    What makes you think that? What debugging have you done to find it?

    APereiraAuthor
    Visitor II
    May 25, 2022

    After careful analysis it's safe to say @Andrew Neil​ that it's an issue for both to solve because it happens in a part of the code that is made by ST and only where the firmware with RTOS runs, the other two don't have this issue and transition even before our work around was implemented.

    Super User
    May 25, 2022

    but, like @Community member​ , I don't see that you've identified what the actual problem is.

    APereiraAuthor
    Visitor II
    May 25, 2022

    But we did, check my answer to Tesla Delorean.

    Super User
    May 26, 2022

    Or even answer the questions already asked by people trying to help you. So one more time:

    Does this happen in your pre-bootloader, bootloader, or application?

    Do either your pre-bootloader or bootloader use FreeRTOS, or only your main application?

    Make DARN SURE that your pre-boot and bootlloaders are disabling any interrupts that they enabled before jumping to your application.

    If the pendSV interrupt is firing, then my first presumption is that either (a) some code BEFORE that startup code you highlighted ran FreeRTOS and somehow it left some interrupt enabled that ran some (OLD) task that set the pendSV bit to call the RTOS, or (b) you have a pointer issue somewhere before that code (perhaps in the bootloader) that is setting the pendSV bit by accident (highly unlikely). Since this happens in the startup code, the interrupt vector table pointer has (probably) not been updated yet, so any interrupt or fault that happens will jump through an OLD vector table. For example, if this occurs in your application, the pendSV vector will be fetched from the vector table used by your pre-boot or bootloader. As will any OTHER interrupt that might still be enabled (systick, timer, UART, etc.).

    APereiraAuthor
    Visitor II
    May 26, 2022

    That's what I've been doing so far. But to be clear here are my answers again in a way everyone understands what I'm answering.

    "And your workaround is to clear all (?) RAM on startup? In the pre-bootloader? Bootloader? App?"

    The workaround is in the pre-bootloader.

    "Do either your pre-bootloader or bootloader use FreeRTOS, or only your main application?"

    "No, only in the App"

    "Make DARN SURE that your pre-boot and bootlloaders are disabling any interrupts that they enabled before jumping to your application."

    First thing we tried, didn't work.

    "If the pendSV interrupt is firing, then my first presumption is that either (a) some code BEFORE that startup code you highlighted ran FreeRTOS and somehow it left some interrupt enabled that ran some (OLD) task that set the pendSV bit to call the RTOS, or (b) you have a pointer issue somewhere before that code (perhaps in the bootloader) that is setting the pendSV bit by accident (highly unlikely). Since this happens in the startup code, the interrupt vector table pointer has (probably) not been updated yet, so any interrupt or fault that happens will jump through an OLD vector table. For example, if this occurs in your application, the pendSV vector will be fetched from the vector table used by your pre-boot or bootloader. As will any OTHER interrupt that might still be enabled (systick, timer, UART, etc.)."

    De-initializing everything before firmware transitions, and clear pending IRQ requests didn't work either, One of my assumptions way before we found the weird behavior shown in the yellow part of the code in the picture above, was a possible microcontroller protection because we were changing the uC frequency causing an hadfault, but no, the problem wasn't there either.

    Super User
    May 26, 2022

    I just re-read your original post. Sorry - I missed where you initially said this happened in the app, not the pre or bootloader.

    "the next time we reset the watch"

    Does this mean "next time we reset the watchdog" (i.e. next time the watchdog timer causes a reset)? Or does this mean your device is a "watch" style device, and the next time it resets?

    "One fix is, for the standby and reboot we de-initialize all hardware and make a reset "

    How do you "reboot" if not by causing a reset (like via the NVIC AIRCR register)? Which would already cause all (internal to the CPU) hardware to reset/de-initialize.

    APereiraAuthor
    Visitor II
    May 26, 2022

    "Does this mean "next time we reset the watchdog" (i.e. next time the watchdog timer causes a reset)? Or does this mean your device is a "watch" style device, and the next time it resets?

    It was not implemented at the time, so no issues here, but the device indeed is a watch with very unique features that I'm not allowed to talk about.

    "How do you "reboot" if not by causing a reset (like via the NVIC AIRCR register)? Which would already cause all (internal to the CPU) hardware to reset/de-initialize."

    Standard existing commands for software reset, and it worked 100% of the times, this was to reboot the device safely in normal conditions.

    Explorer
    May 27, 2022

    There is nothing here indicating a flaw in FreeRTOS.

    As you possess all the source code, and can (presumably) reproduce the fault, you might instrument that code to isolate its cause.

    Instrument means to change the code to collect and expose details to assist you to find what is happening and why.

    How are you resetting the core?

    Does the fault occur without the watchdog enabled? Instrumenting and debugging would be easier without it.

    Disable/remove all features of your software that are not required to reproduce the fault. Just the exercise of ascertaining whether a feature is required or not would assist you to isolate the cause.