Skip to main content
Graduate
September 5, 2025
Solved

One-time UART DMA data mismatch

  • September 5, 2025
  • 4 replies
  • 816 views

Hello all, 

This is my first post on here, if I am in error at any point please direct me to any relevant posting guidelines. 

I am developing a sender-receiver solution involving two STM32H753 microcontrollers, both on ST's NUCLEO-STM32H753 boards. I am using USART3 to transmit and receive one byte using DMA. The sender receives one byte (for example 'S') and the receiver receives one byte, checks whether it matches 'S', and if it does, it toggles the yellow on-board LED. If it does not match, it toggles the green on-board LED. This check is being done in the HAL_UART_RxCpltCallback() function. 

I am facing an issue where the first time the HAL_UART_RxCpltCallback triggers after the receiver is reset, it toggles the green LED once (meaning that the received byte did not match with the expected data), and then continuously toggles the yellow LED (meaning that the received byte does match the expected data, this is the desired behaviour). When I went to debug the receiver, I see that when debugging, this problem does not exist, and it always only toggles the yellow LED, i.e., the received byte always matches with the expected data, 'S'. 

So, in the debugger, everything works fine. But when not debugging, the first iteration of data does not match but all subsequent ones do. 

Below I have snippets for the UART Callback functions and main functions for the transmitter and receiver. I also have a screenshot of STM32CubeIDE that shows in the debug view the matching data in rx_buffer on the first iteration. 

// ------------------------------------------------------------------------------
// Receiver Code:
/* USER CODE BEGIN 0 */
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) {
	if (huart == &huart3) {
		uart_rx_complete = 1;
		callback_count++;
		if (rx_buffer[0] == 'S') {
			HAL_GPIO_TogglePin(LD2_GPIO_Port, LD2_Pin); // Matched!
		} else {
			HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin); // Did not match!
		}
	}
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);
	HAL_UART_Receive_DMA(&huart3, rx_buffer, 1);
}
/* USER CODE END 0 */

/**
 * @brief The application entry point.
 * @retval int
 */
int main(void) {

	/* USER CODE BEGIN 1 */

	/* USER CODE END 1 */

	/* MPU Configuration--------------------------------------------------------*/
	MPU_Config();

	/* Enable the CPU Cache */

	/* Enable I-Cache---------------------------------------------------------*/
	SCB_EnableICache();

	/* Enable D-Cache---------------------------------------------------------*/
	SCB_EnableDCache();

	/* MCU Configuration--------------------------------------------------------*/

	/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
	HAL_Init();

	/* USER CODE BEGIN Init */

	/* USER CODE END Init */

	/* Configure the system clock */
	SystemClock_Config();

	/* USER CODE BEGIN SysInit */

	/* USER CODE END SysInit */

	/* Initialize all configured peripherals */
	MX_GPIO_Init();
	MX_DMA_Init();
	MX_USART6_UART_Init();
	MX_USART3_UART_Init();
	/* USER CODE BEGIN 2 */
	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);
 HAL_UART_Receive_DMA(&huart3, rx_buffer, 1);

	/* USER CODE END 2 */

	/* Infinite loop */
	/* USER CODE BEGIN WHILE */
	while (1) {

		/* USER CODE END WHILE */

		/* USER CODE BEGIN 3 */
	}
	/* USER CODE END 3 */
}

// ------------------------------------------------------------------------------
// Transmitter Code 
/* USER CODE BEGIN 0 */
void HAL_UART_TxCpltCallback(UART_HandleTypeDef *huart) {
 if (huart == &huart3) {
 uart_tx_complete = 1;
 callback_count++;
 HAL_GPIO_TogglePin(LD3_GPIO_Port, LD3_Pin);
 }
}
/* USER CODE END 0 */

/**
 * @brief The application entry point.
 * @retval int
 */
int main(void)
{

 /* USER CODE BEGIN 1 */

 /* USER CODE END 1 */

 /* MPU Configuration--------------------------------------------------------*/
 MPU_Config();

 /* Enable the CPU Cache */

 /* Enable I-Cache---------------------------------------------------------*/
 SCB_EnableICache();

 /* Enable D-Cache---------------------------------------------------------*/
 SCB_EnableDCache();

 /* MCU Configuration--------------------------------------------------------*/

 /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
 HAL_Init();

 /* USER CODE BEGIN Init */

 /* USER CODE END Init */

 /* Configure the system clock */
 SystemClock_Config();

 /* USER CODE BEGIN SysInit */

 /* USER CODE END SysInit */

 /* Initialize all configured peripherals */
 MX_GPIO_Init();
 MX_DMA_Init();
 MX_USART3_UART_Init();
 /* USER CODE BEGIN 2 */
	uint8_t data[1] = {'S'};
 /* USER CODE END 2 */

 /* Infinite loop */
 /* USER CODE BEGIN WHILE */
	while (1) {
		if (uart_tx_complete) {
			SCB_CleanDCache_by_Addr((uint32_t*)data, 1);
			if (HAL_UART_Transmit_DMA(&huart3, data, 1) != HAL_OK) {
				HAL_GPIO_WritePin(LD3_GPIO_Port, LD3_Pin, 1);
				Error_Handler();
			}
		}
		HAL_GPIO_TogglePin(LD1_GPIO_Port, LD1_Pin);
		HAL_Delay(500);
 /* USER CODE END WHILE */

 /* USER CODE BEGIN 3 */
	}
 /* USER CODE END 3 */
}

 

STM32CubeIDE Debug session:
Screenshot from 2025-09-05 10-48-10.png
 

    This topic has been closed for replies.
    Best answer by bmckenney
    	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);

     I suggest you move this to precede the read of rx_buffer[0] (up at the top of the if() block). When you do the read it's been a "long time" since the Invalidate, and it's not unlikely some neighboring variable caused a re-read of the cache line. 

    If I can, I set aside one of the alternate SRAM blocks (SRAM1, e.g.) for DMA buffers, and just keep it non-cacheable. A bit wasteful, maybe, but saves a lot of headaches.

    4 replies

    Graduate
    September 5, 2025

    I checked again, it looks like the problem does exist in debug too. If the first point I break is line 73, the if statement ` if(rx_buffer[0] == 'S' ` , then it does show me that the data in the rx_buffer[0] is 0x0. This is strange because I would expect that the DMA would only have triggered if there was data received, and if that is the case I would expect that it would be whatever data was received over UART. So for the first iteration, the data is not showing up. 

    bmckenneyAnswer
    Explorer II
    September 5, 2025
    	SCB_InvalidateDCache_by_Addr((uint32_t*)rx_buffer, 1);

     I suggest you move this to precede the read of rx_buffer[0] (up at the top of the if() block). When you do the read it's been a "long time" since the Invalidate, and it's not unlikely some neighboring variable caused a re-read of the cache line. 

    If I can, I set aside one of the alternate SRAM blocks (SRAM1, e.g.) for DMA buffers, and just keep it non-cacheable. A bit wasteful, maybe, but saves a lot of headaches.

    Graduate
    September 5, 2025

    This fixed it. Could you point me to some resources for doing what you are suggesting, to have all DMA buffers in a non cacheable region of memory? 

    Thanks! 

    Graduate II
    September 5, 2025

    Yes, want to read what's in memory rather than still in the cache.

    Also needs to be volatile to force compiler to read new content.

    Generally, yes it's better to have uncached regions, on the F7 this could be done using DTCM, but the H7 is more aggravating in this regard, so cache coherency must be managed on input/output buffers appropriately.

    The structures need to be on 32-byte boundaries. Invalidate blows away pending writes, so avoid buffers within structures with other variable you'd be using in close proximity, ie a FIFO buffer with head/tail pointers falling within the same 32-byte cache-line.

    Like I said, there's a lot of friction to using 1-byte DMA, so better to do enough to reward the effort.

    Super User
    September 5, 2025

    In addition to invalidating before you read the data, you also need to ensure rx_buffer is cache-page aligned and that nothing else occupies that flash page. Easiest way to do this is to align it and make it the size of a cache page.

    Graduate II
    September 5, 2025

    The required alignment is 32-bytes, but it's also the minimum width, so surrounding data is subject to collateral damage.

    DMA for ONE byte seems to introduce a lot of friction for zero benefit.

    Check error handling situations, ie where the UART status has noise, framing or parity errors, and the return values from HAL_UART_...  interactions.

    Explorer II
    September 8, 2025

    3A) The MPU setup might resemble:

    // 16KB DMA buffer region at the beginning of RAM_D2
    #define DMABUF_BASE 0x30000000UL // RAM_D2 from the .ld
    #define DMABUF_LOG_SIZE 13u // log2(16K)-1
    MPU->RNR = (1u << MPU_RNR_REGION_Pos); // Select Region 1
    MPU->RBAR = DMABUF_BASE | (0*MPU_RBAR_VALID_Msk); // Address, VALID=0 to use RNR
    MPU->RASR =
     (3u << MPU_RASR_AP_Pos) | // Full access per DDI0403E Table B3-15
     (1u << MPU_RASR_TEX_Pos) | (0*MPU_RASR_C_Msk) | (0*MPU_RASR_B_Msk) | // Non-cacheable per Table B3-13
     MPU_RASR_S_Msk | // Shareable
     (0u << MPU_RASR_SRD_Pos) | // All subregions enabled
     (DMABUF_LOG_SIZE << MPU_RASR_SIZE_Pos) | // Size 16K
     MPU_RASR_ENABLE_Msk; // Enable Region
    MPU->CTRL = MPU_CTRL_PRIVDEFENA_Msk | MPU_CTRL_ENABLE_Msk; // Enable MPU

    [I don't have an appropriate MCU here to try this on, but I think it's about right.]