Skip to main content
Graduate II
September 24, 2024
Question

Guide: Injecting and Handling ECC Errors in RAM and Flash on STM32H7

  • September 24, 2024
  • 2 replies
  • 4903 views

Overview

At my job, I was recently tasked with handling ECC errors in both RAM and flash memory. Additionally, I needed to test the error handling by injecting ECC errors. Figuring out how to do this was a nightmare. I looked over AN5342 countless times, but it was pretty vague. I finally figured out how to do it, and wanted to make this post to help anyone else in the same situation.

 

This post will cover how I set up callbacks for both RAM and flash ECC, as well as how to trigger ECC errors. I am sure there are better ways to do this, and am open to suggestions.

 

Flash ECC Error Handling

1. In the .IOC file, go to system core -> NVIC and enable the flash global interrupt.

2. Navigate to Core -> Inc -> stm32h7xx_hal_conf.h and add:

 

 

#define USE_FLASH_ECC 1U

 

 

3. I found that when a double bit ECC error occurs in flash, the hard fault handler is called instead of the flash IRQ        handler. To get around this, I added a check in the hard fault handler to see if a double bit error occurred in flash:

 

 

void HardFault_Handler(void)
{
 /* USER CODE BEGIN HardFault_IRQn 0 */
	// Check if a double-bit ECC error has occurred
	if ((FLASH->SR1 & FLASH_SR_DBECCERR) || (FLASH->SR2 & FLASH_SR_DBECCERR)) {
		FLASH_IRQHandler();
	}
 /* USER CODE END HardFault_IRQn 0 */
 while (1)
 {
 /* USER CODE BEGIN W1_HardFault_IRQn 0 */
 /* USER CODE END W1_HardFault_IRQn 0 */
 }
}

 

 

4. To enable flash ECC interrupts, I added the following function in main.c:

 

 

void init_flash_ecc() {
	HAL_FLASH_Unlock();

	Address = FLASH_USER_START_ADDR;
	HAL_NVIC_SetPriority(FLASH_IRQn, 0, 0);
	HAL_NVIC_EnableIRQ(FLASH_IRQn);
	HAL_FLASHEx_EnableEccCorrectionInterrupt();
	HAL_FLASHEx_EnableEccDetectionInterrupt();

	HAL_FLASH_Lock();
}

 

 

5. To define the ECC callbacks, I added the following functions in main.c:

 

 

void HAL_FLASHEx_EccCorrectionCallback() {
	HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, 1);
}

void HAL_FLASHEx_EccDetectionCallback() {
	HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, 1);
}

 

 

 

Flash ECC Testing

1. Call the init_flash_ecc function defined above.

2. Erase user flash

3. To cause a single bit error, I use the following function and write data:

 

 

uint64_t SingleErrorA[4] = { 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCC0
						 };

uint64_t SingleErrorB[4] = { 0xCCCCCCCCCCCCCCCC,
						 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCC1
						 };
void cause_flash_single_error() {
	HAL_FLASH_Unlock();

	if (HAL_FLASH_Program(FLASH_TYPEPROGRAM_FLASHWORD, Address, ((uint32_t) SingleErrorB)) != HAL_OK) {
		Error_Handler();
	}

	if (HAL_FLASH_Program(FLASH_TYPEPROGRAM_FLASHWORD, Address, ((uint32_t) SingleErrorA)) != HAL_OK) {
		Error_Handler();
	}

	uint64_t readData[4];
	for (int i = 0; i < 4; i++) {
		readData[i] = *((uint64_t*) (Address + i * 8)); // Read 64 bits at a time
	}

	HAL_FLASH_Lock();
}

 

 

4. To cause a double bit error, I use the following function and write data:

 

 

uint64_t DoubleErrorA[4] = { 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCB
						 };

uint64_t DoubleErrorB[4] = { 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC,
							 0xCCCCCCCCCCCCCCCC
						 };
void cause_flash_double_error() {
	HAL_FLASH_Unlock();

	if (HAL_FLASH_Program(FLASH_TYPEPROGRAM_FLASHWORD, Address, ((uint32_t) DoubleErrorB)) != HAL_OK) {
		Error_Handler();
	}

	if (HAL_FLASH_Program(FLASH_TYPEPROGRAM_FLASHWORD, Address, ((uint32_t) DoubleErrorA)) != HAL_OK) {
		Error_Handler();
	}

	uint64_t readData[4];
	for (int i = 0; i < 4; i++) {
		readData[i] = *((uint64_t*) (Address + i * 8)); // Read 64 bits at a time
	}

	HAL_FLASH_Lock();
}

 

 

 

RAM ECC Error Handling

1. In the .IOC file, go to System Core -> RAMECC. Check the boxes next to each region of RAM you want to                monitor.

2. In main.c, navigate to the auto-generated function MX_RAMECC_Init. At the top of this function, I initialize all            monitored regions of RAM by writing 0 to them. At the bottom, I added the following lines to enable the RAM              ECC IRQ:

 

 

 HAL_NVIC_SetPriority(ECC_IRQn, 0, 0);
 HAL_NVIC_EnableIRQ(ECC_IRQn);

 

 

3. Enable notifications and start monitoring for each RAMECC handle. I did it using the following function in main.c:

 

 

void enable_ramecc_monitor_notifications(RAMECC_HandleTypeDef *hramecc) {
	if (HAL_RAMECC_EnableNotification(hramecc, (RAMECC_IT_MONITOR_SINGLEERR_R | RAMECC_IT_MONITOR_DOUBLEERR_R)) != HAL_OK) {
		Error_Handler();
	}
	if (HAL_RAMECC_StartMonitor(hramecc) != HAL_OK) {
		Error_Handler();
	}
}

 

 

4. Add the following callback in main.c:

 

 

void HAL_RAMECC_DetectErrorCallback(RAMECC_HandleTypeDef *hramecc) {
	uint32_t FAR;

	FAR = HAL_RAMECC_GetFailingAddress(hramecc);

	if ((HAL_RAMECC_GetRAMECCError(hramecc) & HAL_RAMECC_SINGLEERROR_DETECTED) != 0U) {
		HAL_GPIO_WritePin(LD1_GPIO_Port, LD1_Pin, 1);
	}

	if ((HAL_RAMECC_GetRAMECCError(hramecc) & HAL_RAMECC_DOUBLEERROR_DETECTED) != 0U) {
		HAL_GPIO_WritePin(LD2_GPIO_Port, LD2_Pin, 1);
	}

	hramecc->RAMECCErrorCode = HAL_RAMECC_NO_ERROR;
	HAL_GPIO_WritePin(LD3_GPIO_Port, LD3_Pin, 1);

}

 

 

4. Navigate to Core -> Src -> stm32h7xx_it.c. Add function ECC_IRQHandler in the user code section, and add            checks for each enabled RAM ECC monitor to see if they have any flags raised. This will enable the IRQ handler      to pass the callback the respective ECC handle. Here is mine:

 

 

void ECC_IRQHandler(void)
{
 if (__HAL_RAMECC_GET_FLAG(&hramecc1_m1, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc1_m1);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc1_m2, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc1_m2);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc1_m3, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc1_m3);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc1_m4, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc1_m4);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc1_m5, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc1_m5);
 }

 if (__HAL_RAMECC_GET_FLAG(&hramecc2_m1, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc2_m1);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc2_m2, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc2_m2);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc2_m3, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc2_m3);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc2_m4, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc2_m4);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc2_m5, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc2_m5);
 }

 if (__HAL_RAMECC_GET_FLAG(&hramecc3_m1, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc3_m1);
 }
 if (__HAL_RAMECC_GET_FLAG(&hramecc3_m2, RAMECC_FLAGS_ALL)) {
 HAL_RAMECC_IRQHandler(&hramecc3_m2);
 }
}

 

 

 

RAM ECC Testing

1. In step 2 of the RAM ECC error handling section, I say to initialize all sections of monitored ram by writing 0 to          them. To cause an ECC error in a specific region of RAM, skip this initialization for the region of RAM you want          to test.

2. In the main function of main.c, I call a function to read from all regions of RAM. When this function tries to read          from the section of RAM I did not initialize in step 1, the ECC callback is triggered.

 

Conclusion

I did the best I could with the time I had to write this guide, so it is not perfect, but my hope is that it can help someone and make this process easier for people scouring the forum to try and find concrete examples of how to implement and test ECC.

-Jared

 

    This topic has been closed for replies.

    2 replies

    Graduate
    October 8, 2024

    Hey Jared. Ive been struggling to trigger a flash ecc error. I followed AN5342 to the best of my ability but couldn't figure it out. 
    Thanks a lot for your guide. Helped me out a lot.

    And yes, a Double Error Detect always triggers the hard fault handler after the memory read, whether I use interrupts or just poll the error flag. A Single Error Correction works just fine.

    What I noticed with the memory analyzer is when you make the second flash write to trigger the DED, it seems to either corrupt the data or lose access upon a failed ecc? I am not sure. In that case, performing a read of the memory would trigger the hardfault handler. See attached.

    Here is the memory after flash erase.

    Capture0.PNG

    Memory after the first write

    Capture1.PNG// I am writing 0xCCCCCCC3 at the end

    Memory after the second write

    Capture2.PNG// I write 0xCCCCCCC0 to trigger the DED

    Anyway, Im not sure if that is what should be expected or not.

    Graduate II
    October 8, 2024

    Hello,

    I'm glad the guide was able to help.

    The behavior you are seeing with the memory analyzer aligns with what I was seeing.

     

    Out of curiosity, what sort of application are you working on that requires handling ECC errors?

    - Jared

    Graduate
    October 10, 2024

    Its not a requirement, just curious if an ECC error occurs and if so, how often in an industrial setting.

    ST Employee
    October 16, 2024

    Thanks @20jmorrison ,

    do you mind if we incorporate parts of your work here in future update of the AN5342?

    BR,

    J

    Graduate II
    October 16, 2024

    First off, incredible username and profile picture @Bubbles , lol.

    Absolutely, feel free to use any/all parts of it that you see fit.

    -Jared

    ST Employee
    October 25, 2024

    You could say I'm a huge fan, including the actors musical career.