Skip to main content
Explorer
May 14, 2021
Question

We see that the new lot of production units are hitting usage fault with INVPC status on USFR.

  • May 14, 2021
  • 4 replies
  • 2657 views

We have a situation on our stm32f765 running freeRTOS on one of our production ongoing avionic product. We see that with new lot of production units are hitting usage fault with INVPC status on USFR.

Can some advice be provided on figuring out the last PC value before the exception happened, the main/process stack does not reflect this.

Also looking at the errata it looks like there seems to be an issue around usage of data cache.

Does this in anyway correlate to the above issue we are facing.

    This topic has been closed for replies.

    4 replies

    Graduate II
    May 14, 2021

    >>the main/process stack does not reflect this.

    What does it reflect?

    Perhaps provide a more through dump of the registers at the fault, and the stack the fault used, see LR to determine

    Would look for callbacks that aren't initialized, empty vectors, and stack corruption/overrun.

    Graduate II
    May 14, 2021
    Graduate II
    May 18, 2021

    Random error often point to erratic supply, overheated device, too high frequency or too few wait states.

    Graduate II
    May 18, 2021

    Yes, would definitely back-off on the flash wait states. Some of the ST examples seem a bit aggressive, and to be honest most of the parts with ART or other caching do a good job at masking the slowness of the FLASH, and do have a more aggressive prefetch path than SRAM offers.

    The F2 parts did have a critical path in the ART/Prefetch, which seemed to be particularly triggered by GNU/GCC generated code.

    This F7 part has a very early version of the CM7 core, only the F74x/F75x parts use it, all the subsequent parts use newer cores.

    I don't think one batch vs the next should be particularly susceptible to the errata, you should read the Device ID and stepping from the DBGMCU registers. Process variables could change the transistor speeds and these might make in more susceptible to supply voltage, but the process window should be fairly tight/constrained to meet specs. Of the things you can change the Flash Wait States would be the first thing to look at.

    Also look at what's happening with VCAP pins, the voltages, and the capacitors placed. Issues here have been seen to generate the types of failure reported.

    ShinoyAuthor
    Explorer
    May 20, 2021

    Our hardware is in production close to an year now and this issue started appearing recently with new units that are being produced.

    I checked the REV_ID from the DBGMCU register of one of the faulty hardware unit and the good news is that the REV_ID is matching with the one that is mentioned in the errata.

    Visitor II
    May 21, 2021

    Hi @Shinoy​ ,

    This behavior is not specific to the product latest version only. It is related to all versions and all Cortex-M7_based products.

    Normally, the problem was related to D-Cache which was causing some crashes. But what I propose for you is to contact an FAE. This request may need an FAR (Failure Analysis Request) to manage with an FAE.

    With Best Regards,

    Ons

    Graduate II
    May 22, 2021
     /* Enable branch prediction */
    SCB->CCR |= (1 <<18); 
     __DSB();
    SCB_InvalidateICache(); 
    SCB_EnableICache(); 
    SCB_InvalidateDCache(); 
    SCB_EnableDCache(); 

    From all these lines only the two enable calls are useful. Invalidation is already done in enable functions internally. And the BP (branch prediction) and STKALIGN (stack alignment) bits in SCB_CCR register are read-only.

    https://developer.arm.com/documentation/dui0646/a/cortex-m7-peripherals/system-control-block/configuration-and-control-register

    As for the issue... While it can be a board or even MCU level hardware issue, 99% of problems are because of broken software. Typically there are many "unimportant" hidden issues ignored by developers just waiting to hit. For example a wrong voltage scale or FLASH latency settings. As you tested, this one seems to be related to cache memory. Are you sure about the MPU configuration? Take a note that a broken cache management can lead to invalid function pointers. For a proper cache management example read my answer here:

    https://community.st.com/s/question/0D53W00000oXSzySAG/different-cache-behavior-between-stm32h7-and-stm32f7