Explorer II

Question

Probability of peripheral failure on a STM32H7

Forum|Forum|2 years ago
December 7, 2023
5 replies
2841 views

In the context of functional safety application, we are wondering the probability of a status bit failure on a STM32H753.

A low-level software is frequently polling on status bits of peripherals. If this peripheral encounter a failure this is obviously an issue. Even if there are solutions to this issue (watchdog for example) we'd like to assess the probability of such failures.

Are there any studies regarding the probability of a peripheral failure on STM32 (due to cosmic ray, bitshift or bit stuck in a register, whatever) ?

This topic has been closed for replies.

W

waclawek.jan

Super User

Why would such assesment be constrained on peripheral status bits?

Surely, cosmic rays etc. impact in similar way all other registers in the STM32, of which many are in the processor itself, the bus matrix and surrounding support circuitry; and the possibly largest pool of registers, ie. the RAM.

Generally, what you need and probably can get upon request to ST (directly, through FAE or web support form) is a FIT number.

JW

T

Tesla DeLorean

Graduate II

Not sure rated for space operation.

But generally why you don't infinite loop things and have timeouts so you can respond / recover in a orderly fashion.

P

Pavel A.

Super User

Why space operation? just a usual freakin' laser anti-drone gun ; )

M

mƎALLEm

Technical Moderator

Hello,

I don't know if it does help:

https://www.opensourcesatellite.org/downloads/KS-DOC-01251_STM32H7_Radiation_Test_Report.pdf

W

waclawek.jan

Super User

Are you interested in (A) real safety, or (B) in fulfilling some external requirements e.g. for some sort of certification?

In case of (A), the probability of one flip-flop (the status register's bit) failing is several orders of magnitude lower than probability of failure in the thousands of gates needed to execute the "control mechanism".

And that is several orders of magnitude lower than the probability of having a software bug.

In other words, the timeout mechanism may not be only redundant, it may even be harmful. I don't really have an universal solution for (A), as that has to be judged per case (yes, it's very, very, very, very expensive, and the result of analysis may turn out to be, that given circumstances, there's nothing to make it better).

Now in case of (B), you have no choice, no decision, and no options to contemplate. You simply have to fulfill the requirements given externally, not questioning their rationale, as they may be purely legal or administrative i.e. non-technical, non-rational. Plus you have the extra burden of not making things much worse in that process.

Sorry, but it is what it is.

JW

P

Pavel A.

Super User

One of possible answers to this question is Cortex-R. Basically it is duplication of all MCU functions that should help against "random glitches". But this is exotic, expensive and efficiency not obvious. When possible, double or triple reservation is done on the level of larger modules that include a normal Cortex MCU (not R), together with sensors, memories etc.; integrated by external "arbiter" circuit. The "arbiter" decides which instance will drive output signals and which is faulty. Of course this is complex and costly - but can be done with widely available parts.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded