STM32H7S XSPI1 Memory Fault Despite Errata Workaround Implementation

Question

System OverviewHardware ConfigurationMicrocontroller: STM32H7S3L8H6Memory ArchitectureXSPI1 (Flash, MX25UW25645GXDI00): 32MB, accessed via GPIOP/GPIOOClock: PLL2S source, Prescaler = 3flash_xspi_handle.Instance = config->instance; flash_xspi_handle.Init.FifoThresholdByte = 1; flash_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM; flash_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_MACRONIX; flash_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_256MB; flash_xspi_handle.Init.ChipSelectHighTimeCycle = 2; flash_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE; flash_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0; flash_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED; flash_xspi_handle.Init.ClockPrescaler = 3; flash_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE; flash_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_ENABLE; flash_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE; flash_xspi_handle.Init.MaxTran = 0; flash_xspi_handle.Init.Refresh = 0; flash_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;XSPI2 (HyperRAM, W956D8MBYA5I): 8MB, accessed via GPIONClock: PLL2S source, Prescaler = 1ram_xspi_handle.Instance = config->instance; ram_xspi_handle.Init.FifoThresholdByte = 1; ram_xspi_handle.Init.MemoryMode = HAL_XSPI_SINGLE_MEM; ram_xspi_handle.Init.MemoryType = HAL_XSPI_MEMTYPE_HYPERBUS; ram_xspi_handle.Init.MemorySize = HAL_XSPI_SIZE_64MB; ram_xspi_handle.Init.ChipSelectHighTimeCycle = 5; ram_xspi_handle.Init.FreeRunningClock = HAL_XSPI_FREERUNCLK_DISABLE; ram_xspi_handle.Init.ClockMode = HAL_XSPI_CLOCK_MODE_0; ram_xspi_handle.Init.WrapSize = HAL_XSPI_WRAP_NOT_SUPPORTED; ram_xspi_handle.Init.ClockPrescaler = 1; ram_xspi_handle.Init.SampleShifting = HAL_XSPI_SAMPLE_SHIFT_NONE; ram_xspi_handle.Init.DelayHoldQuarterCycle = HAL_XSPI_DHQC_DISABLE; ram_xspi_handle.Init.ChipSelectBoundary = HAL_XSPI_BONDARYOF_NONE; ram_xspi_handle.Init.MaxTran = 0; ram_xspi_handle.Init.Refresh = 355; ram_xspi_handle.Init.MemorySelect = HAL_XSPI_CSSEL_NCS1;MPU Region MapI-Cache: Enabled (instruction fetch optimization)D-Cache: Enabled (data fetch optimization)Both caches critical for maintaining performance with external memory access at 600 MHzSystem ClockingCPU Core: 600 MHzHCLK (AHB): 300 MHzBoth XSPI instances: PLL2S clock sourcePLL2S = 200MHzHyperram = 100 MhzFlash = 50 MHzPower DomainsI/O Supply (XSPI signals): 1.8VCore Supply: 1.8VIssue DescriptionI am getting garbage data reading from External Flash Memory during normal execution of application under some circumstances. In depth details at the end.As per recommendation, I have implemented the I/O compensation cell workaround as described in the STM32H7Rxx/7Sxx errata sheet for the duty-cycle skew issue. However, I am still experiencing memory faults when accessing data from the external flash on XSPI1 (sometimes).Temperature Correlation:The error appears significantly more frequently when the MCU is heated, even slightly above room temperature.This strongly suggests ongoing data corruption issues during XSPI1 communication.Implemented WorkaroundI have implemented the compensation cell adjustment as follows as per ERRATA mentioned HERE:void xspi_configure_compensation_cells(void) { const board_config_t *board = board_defs_get_config(); // Configure compensation cells for XSPI1 if (board->xspi1.instance != NULL) { HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_CELL_CODE, 0U, 0U); HAL_SBS_EnableCompensationCell(SBS_IO_XSPI1_CELL); while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI1_CELL_READY) != 1U) { // Wait for compensation cell ready } // ERRATA: I/O compensation duty-cycle skew workaround // Apply compensation values adjustment to prevent duty-cycle skew and jitter // Read boot-time compensation values from SBS_CCVALR register uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_PSRC_Pos) & 0xFU; uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI1_NSRC_Pos) & 0xFU; // Apply compensation adjustment per errata specification: // SW_Psrc=boot_PSRC - 2 // SW_Nsrc=boot_NSRC + 2 int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT; int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT; // Clamp values to valid 4-bit range [0, 15] if (adj_psrc < 0) { adj_psrc=0; } if (adj_psrc > 15) { adj_psrc=15; } if (adj_nsrc < 0) { adj_nsrc=0; } if (adj_nsrc > 15) { adj_nsrc=15; } // Write adjusted compensation values to SBS_CCSWVALR register HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI1_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc); HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI1_HSLV); } // Configure compensation cells for XSPI2 if (board->xspi2.instance != NULL) { HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_CELL_CODE, 0U, 0U); HAL_SBS_EnableCompensationCell(SBS_IO_XSPI2_CELL); while (HAL_SBS_GetCompensationCellReadyStatus(SBS_IO_XSPI2_CELL_READY) != 1U) { // Wait for compensation cell ready } // ERRATA: I/O compensation duty-cycle skew workaround // Apply compensation values adjustment to prevent duty-cycle skew and jitter // Read boot-time compensation values from SBS_CCVALR register uint32_t boot_psrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_PSRC_Pos) & 0xFU; uint32_t boot_nsrc=(SBS->CCVALR >> SBS_CCVALR_XSPI2_NSRC_Pos) & 0xFU; // Apply compensation adjustment per errata specification: // SW_Psrc=boot_PSRC - 2 // SW_Nsrc=boot_NSRC + 2 int32_t adj_psrc=(int32_t)boot_psrc + XSPI_COMP_PSRC_ADJUSTMENT; int32_t adj_nsrc=(int32_t)boot_nsrc + XSPI_COMP_NSRC_ADJUSTMENT; // Clamp values to valid 4-bit range [0, 15] if (adj_psrc < 0) { adj_psrc=0; } if (adj_psrc > 15) { adj_psrc=15; } if (adj_nsrc < 0) { adj_nsrc=0; } if (adj_nsrc > 15) { adj_nsrc=15; } // Write adjusted compensation values to SBS_CCSWVALR register HAL_SBS_ConfigCompensationCell(SBS_IO_XSPI2_CELL, SBS_IO_REGISTER_CODE, adj_nsrc, adj_psrc); HAL_SBS_EnableIOSpeedOptimize(SBS_IO_XSPI2_HSLV); } }Root Cause Analysis of Main Issue:When the application runs, I encounter hard fault with the following characteristics:Please refer to following capture of Fault capture, which is explained in detail below:IMAGE 1 :IMAGE 2 :Understanding the Memory Fault AboveThis is a pointer corruption issue caused by corrupted data being read from external flash memory. Let me break down the failure sequence:The MCU tries to execute following C Code, and triggers fault at executing__HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT);void stm32_cyhal_sdio_irq_handler(void) { uint32_t sta_reg = _cyhal_sdio_handle->Instance->STA; cyhal_sdio_t* obj = (cyhal_sdio_t*)_cyhal_sdio_handle->Context; if ((_cyhal_sdio_handle != NULL) && (__HAL_SD_GET_FLAG(_cyhal_sdio_handle, SDMMC_STA_SDIOIT) != RESET)) { ((cyhal_sdio_event_callback_t)obj->callback)(obj->callback_arg, CYHAL_SDIO_CARD_INTERRUPT); /* Clear the interrupt */ __HAL_SD_CLEAR_FLAG(_cyhal_sdio_handle, SDMMC_FLAG_SDIOIT); /* Mask interrupt, to be unmasked later by Tx Path */ __HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT); <- FAULT AT THIS LINE ...... }The Execution FlowUnderstanding the execution flow of above code with Disassembly: This single line of C code __HAL_SD_DISABLE_IT(_cyhal_sdio_handle, SDMMC_IT_SDIOIT) compiles to the four assembly instructions that are failing catastrophically (check image 2 for clear reference): ldr r3, [pc, #196] // Load address of _cyhal_sdio_handleldr r3, [r3, #0] // Dereference _cyhal_sdio_handleldr r3, [r3, #0] // Access _cyhal_sdio_handle->Instanceldr r2, [r3, #60] // Read Instance->MASK (at offset 60)Detailed BreakdownPointer Dereferences1. Get the address of the global variable _cyhal_sdio_handle:ldr r3, [pc, #196]// Equivalent to: &_cyhal_sdio_handle// R3 = 0x24021480 (address where the handle pointer is stored)2. Dereference to get the handle value:ldr r3, [r3, #0]// Equivalent to: _cyhal_sdio_handle// R3 = *0x24021480 = 0x24001b04 (the actual sdio handle)3. Access the Instance member (first member of the struct, offset 0):ldr r3, [r3, #0]// Equivalent to: _cyhal_sdio_handle->Instance// R3 = *0x24001b04 = 0x48002400 (Instance address) 4. Read the STA register (at offset 60 bytes in SDMMC_TypeDef):ldr r2, [r3, #60]// Equivalent to: _cyhal_sdio_handle->Instance->MASK // R2 = *(0x48002400 + 60) = 0x40013a (MASK value) Why This Particular Line FailsThe Corruption PointThe very first instruction reads from the literal pool in external flash:ldr r3, [pc, #196] // Reading from 0x900D1A78 in external flashThis literal pool contains the address of _cyhal_sdio_handle (0x24021480), but due to flash corruption, it reads UNKNOWNdata instead which starts the domino effect.The Cascade Effect Why This Causes a FaultThe corrupted address 0xFA7BAFFF points to unmapped/protected memory space, triggering the memory management unit (MMU/MPU) to generate a hard fault when the CPU attempts to access it.The corrupted pointer value (0xfa7bafff) indicates that data being read from the external flash via XSPI1 is incorrect. This is not a pointer that exists in my code - it's garbage data resulting from communication errors. VIDEOVideo Showing Corrupted R3 along with Disassembly View Showing Call Stack is HERE Thank youANY GUIDANCE would be greatly appreciated. The thermal sensitivity suggests the compensation workaround is not fully addressing the signal integrity issues at the I/O level. Also, there might be different problem as well.

STOne-32 · Answer

Dear @JenishR ,First , thank you for the detailed description , including the  code snippets and Video. At first analysis, it is unlikely your are facing the errata case as your External Flash is running at 50MHz and RAM at 100MHz. The case of errata is seen only when running at 150MHZ up to 200MHz in DDR mode . As the temperature is affecting your case, I would suggest to check and share your schematics and PCB and the compatibility with the external memories .Let us know.Regards,STOne-32

System Overview

Hardware Configuration

Memory Architecture

MPU Region Map

System Clocking

Power Domains

Issue Description

Temperature Correlation:

Implemented Workaround

Root Cause Analysis of Main Issue:

Understanding the Memory Fault Above

The Execution Flow

Detailed Breakdown

Pointer Dereferences

Why This Particular Line Fails

The Corruption Point

The Cascade Effect

Why This Causes a Fault

VIDEO

Thank you

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded