Skip to main content
Explorer II
December 7, 2023
Question

Probability of peripheral failure on a STM32H7

  • December 7, 2023
  • 5 replies
  • 2841 views

 

In the context of functional safety application, we are wondering the probability of a status bit failure on a STM32H753.

A low-level software is frequently polling on status bits of peripherals. If this peripheral encounter a failure this is obviously an issue. Even if there are solutions to this issue (watchdog for example) we'd like to assess the probability of such failures.

Are there any studies regarding the probability of a peripheral failure on STM32 (due to cosmic ray, bitshift or bit stuck in a register, whatever) ?

    This topic has been closed for replies.

    5 replies

    Super User
    December 7, 2023

    Why would such assesment be constrained on peripheral status bits?

    Surely, cosmic rays etc. impact in similar way all other registers in the STM32, of which many are in the processor itself, the bus matrix and surrounding support circuitry; and the possibly largest pool of registers,  ie. the RAM.

    Generally, what you need and probably can get upon request to ST (directly, through FAE or web support form) is a FIT number.

    JW

    Graduate II
    December 7, 2023

    Not sure rated for space operation.

    But generally why you don't infinite loop things and have timeouts so you can respond / recover in a orderly fashion.

    Super User
    December 7, 2023

    Why space operation? just a usual freakin' laser anti-drone gun ; )

    Technical Moderator
    December 7, 2023
    Super User
    December 8, 2023

    Are you interested in (A) real safety, or (B) in fulfilling some external requirements e.g. for some sort of certification?

    In case of (A), the probability of one flip-flop (the status register's bit) failing is several orders of magnitude lower than probability of failure in the thousands of gates needed to execute the "control mechanism".

    And that is several orders of magnitude lower than the probability of having a software bug.

    In other words, the timeout mechanism may not be only redundant, it may even be harmful. I don't really have an universal solution for (A), as that has to be judged per case (yes, it's very, very, very, very expensive, and the result of analysis may turn out to be, that given circumstances, there's nothing to make it better).

    Now in case of (B), you have no choice, no decision, and no options to contemplate. You simply have to fulfill the requirements given externally, not questioning their rationale, as they may be purely legal or administrative i.e. non-technical, non-rational. Plus you have the extra burden of not making things much worse in that process.

    Sorry, but it is what it is.

    JW

     

    Super User
    December 8, 2023

    One of possible answers to this question is Cortex-R. Basically it is duplication of all MCU functions that should help against "random glitches". But this is exotic, expensive and efficiency not obvious. When possible, double or triple reservation is done on the level of larger modules that include a normal Cortex MCU (not R), together with sensors, memories etc.; integrated by external "arbiter" circuit. The "arbiter" decides which instance will drive output signals and which is faulty. Of course this is complex and costly - but can be done with widely available parts.