Skip to main content
Visitor II
October 14, 2025
Question

STM32 + Quectel BG95 Modem: System hangs after ~60 days - UART Idle Line setup never returns

  • October 14, 2025
  • 5 replies
  • 585 views

Background

We have a battery-powered device using:

  • MCU: STM32 (STM32L4series)
  • Modem: Quectel BG95 (cellular)
  • Communication: UART with Idle Line Interrupt (HAL_UARTEx_ReceiveToIdle_IT)
  • Power mode: Device goes to sleep periodically, wakes up to communicate with modem

Architecture Pattern

Our communication pattern:

  1. Device wakes from low-power mode
  2. Call low-level AT command function that:
    • Enables UART RX Interrupt with Idle Line detection (HAL_UARTEx_ReceiveToIdle_IT)
    • Sends AT command
    • Waits for response with timeout
    • Disables interrupt (HAL_UART_Abort_IT)
  3. Process response
  4. Go back to sleep

This means we re-enable/disable UART Idle Line interrupt on every transaction (potentially hundreds of times per day).

UART + Ring Buffer Setup:

#define RING_BUFFER_SIZE 2048
#define ISR_BUFFER_SIZE 1024

typedef struct {
 lwrb_t rb; // lwrb ring buffer
 volatile bool rx_data_ready_flag; // Flag set by ISR
 uint8_t rb_buffer[RING_BUFFER_SIZE]; // Ring buffer storage
 uint8_t isr_buffer[ISR_BUFFER_SIZE]; // Intermediate buffer for Idle Line ISR
} uart_rb_t;

uart_rb_t modem_rb;

 

ISR Callback:

void HAL_UARTEx_RxEventCallback(UART_HandleTypeDef *huart, uint16_t Size) {
 if (huart == &MODEM_UART) {
 // Save data from ISR buffer to ring buffer
 lwrb_write(&modem_rb.rb, modem_rb.isr_buffer, Size);
 modem_rb.rx_data_ready_flag = true;
 
 // Re-enable Idle Line interrupt
 int retries = 10;
 do {
 if (HAL_UARTEx_ReceiveToIdle_IT(&MODEM_UART, 
 modem_rb.isr_buffer,
 sizeof(modem_rb.isr_buffer)) == HAL_OK) {
 break;
 }
 retries--;
 } while (retries > 0);
 }
}

 

The Problem

After ~60 days of continuous operation, the device completely froze with no recovery.

Symptoms:

  • System printed the last log line: >>StartParse >>>>
  • Then complete silence - no further output
  • No HardFault triggered (we have handler with reset + logging - it was never called)
  • Device required power cycle

Last logs before hang:

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"AT+QISTATE=1,1" --> [response received OK]

<<EndParse <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>StartParse >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[SYSTEM FROZE HERE - no further output]

Notice: The command string was not printed in the last call, suggesting the code hung before that point.

Code Structure (simplified)

Parent function:

uint32_t tick = HAL_GetTick();
do {
 return_code = modem_send_command_wait_parse_result(
 "AT+QISTATE=1,1", 
 "+QISTATE:", 
 /* parsing params */,
 300 /* timeout ms */
 );
 
 if (condition_met) break;
 HAL_Delay(100);
 
} while (HAL_GetTick() - tick < 15000); // 15 second outer timeout

 

Low-level function structure:

int modem_send_command_wait_parse_result(..., int timeout, ...) {
 // Local buffers
 char formatted_command[1024] = {0};
 char buffer_final[2048] = {0};
 unsigned int total_bytes = 0;
 
 printf(">>StartParse >>>>\n");
 if (command_to_send != NULL) {
 printf("\"%s\" --> ", command_to_send);
 }
 
 // Enable UART RX Interrupt with Idle-line detection
 int retries = 10;
 do {
 if (HAL_UARTEx_ReceiveToIdle_IT(&MODEM_UART, 
 modem_rb.isr_buffer,
 sizeof(modem_rb.isr_buffer)) == HAL_OK) {
 break;
 }
 retries--;
 } while (retries > 0);
 
 if (retries == 0) {
 return_code = -1;
 }
 
 uint32_t tick = HAL_GetTick();
 
 // Main receive loop with timeout
 while (return_code > 0 && ((HAL_GetTick() - tick) < timeout)) {
 // Check flag and read from ring buffer
 if (modem_rb.rx_data_ready_flag) {
 modem_rb.rx_data_ready_flag = false;
 
 int bytes = lwrb_read(&modem_rb.rb, 
 &buffer_final[total_bytes],
 sizeof(buffer_final) - total_bytes);
 total_bytes += bytes;
 }
 
 // Parse response, check for expected strings, etc.
 // ...
 }
 
 // Cleanup
 HAL_UART_Abort_IT(&MODEM_UART);
 lwrb_reset(&modem_rb.rb);
 
 return return_code;
}

 

My Questions

  1. Is re-enabling UART Idle Line interrupt on every transaction a valid approach? Could repeatedly calling HAL_UARTEx_ReceiveToIdle_IT (with 1KB buffer) → HAL_UART_Abort_IT → HAL_UARTEx_ReceiveToIdle_IT (hundreds of times over 60 days) cause UART peripheral corruption?
  2. The retry loop, when we enable idle line in low level, sometimes returns != HAL_OK (HAL_BUSY or HAL_ERROR). Is this expected behavior, or does it indicate underlying UART state problems that could accumulate over time? 
  3. Race condition in flag handling? The rx_data_ready_flag is set in ISR and cleared in main loop without atomic operations. Could this cause issues:
// Main loop (non-atomic):
 if (modem_rb.rx_data_ready_flag) { // Read
 modem_rb.rx_data_ready_flag = false; // Write - ISR could interrupt here!
 }​
  • Could printf() cause the hang? We use printf extensively for debugging over a separate UART. Could printf buffer overflow or UART TX blocking cause the system to freeze without triggering exceptions?
  • HAL_GetTick() overflow handling: After 50 days, HAL_GetTick() wraps around. Our timeout check is (HAL_GetTick() - tick) < timeout. Is this safe with overflow?
  • Ring buffer overflow: If the 2KB ring buffer fills up and data is lost, could this cause the expected response string to never arrive, leading to timeout? Though this should be caught by the timeout logic...

Request for advice:

  • Are we using UART Idle Line correctly for this use case (repeated enable/disable)?
  • Is the flag handling race condition a real concern, or is it benign?
  • Any known issues with long-term UART peripheral usage on STM32?
  • Could be an issue the OPEN LOG dev board that we have connected to our device in order to collect logs into an SD card?
    This topic has been closed for replies.

    5 replies

    Super User
    October 14, 2025

    You should be able to attach a debugger without resetting the device to examine its state.

    There is nothing inherent to the device which stops working after X days. It has to be a bug in the code somewhere.

    1ms * 2^32 is 49.7 days. Possible you have an issue with a timeout that uses systick, but the HAL functions handle this overflow correctly.

    Super User
    October 14, 2025

    Is this just one isolated instance on one particular device, or are you seeing many such occurrences?

    Visitor II
    October 14, 2025

    This is the first occurrence we've seen. We have 3 devices in the field for 2 months, and this is the only one that froze after 60 days. However, we're concerned it may be a latent bug that will affect others.

    Super User
    October 14, 2025

    @GR88_gregni wrote:
    • Are we using UART Idle Line correctly for this use case (repeated enable/disable)?

    So what, exactly, is the purpose of the Idle Line interrupt in your system?

    You seem to be just doing AT commands - that can be (usually is?) done without Idle Line detection...

    Visitor II
    October 14, 2025

    We use Idle Line interrupt to detect when the BG95 modem has finished transmitting its response, since AT command responses are variable-length and we don't know in advance how many bytes to expect.

    Super User
    October 14, 2025

    @GR88_gregni wrote:

    AT command responses are variable-length and we don't know in advance how many bytes to expect.


    But they have well-defined termination criteria.

    The usual approach is to look for the termination.

    Super User
    October 14, 2025

    Did you resolver your Long Blocking Operations Dilemma ?

    Having very long blocking delays sounds risky...

     

    Visitor II
    October 14, 2025

    Yes I have resolved that.

    Super User
    October 14, 2025

    Then please feed-back in that thread and mark the solution.

    Visitor II
    November 5, 2025

    @TDK @Andrew Neil  I used my program in debug mode and I saw that the program when called HAL_UARTEx_ReceiveToIdle_IT  stopped inside this function in the stm32l4xx_hal_uart.c 

    void HAL_UART_IRQHandler(UART_HandleTypeDef *huart)
    {
     uint32_t isrflags = READ_REG(huart->Instance->ISR);
     uint32_t cr1its = READ_REG(huart->Instance->CR1); // Stoped here
     uint32_t cr3its = READ_REG(huart->Instance->CR3);
    
     uint32_t errorflags;
     uint32_t errorcode;
    .....
    }

    The call stack in debug mode is the following:
    HAL_UART_IRQHandler() ->Stopped here at the line that mentioned above.
    USART1_IRQHandler() -> HAL_UART_IRQHandler(&huart1);
    UART_Start_Receive_IT() -> 

    /* Computation of UART mask to apply to RDR register */

    UART_MASK_COMPUTATION(huart);


    HAL_UARTEx_ReceiveToIdle_IT()
    modem_send_command_wait_parse_result()
    modem_open_connection()
    udp_FSM_open_socket()


    I saw the UART1 ISR register and the only bits that are asserted are: 
    IDLE
    RXNE
    TC
    TXE
    EOBF

    From the register side:
    sp = 0x20016f60
    lr = 134363963

    Disassembly shows this:

    2294 uint32_t cr1its = READ_REG(huart->Instance->CR1);
    0802b8f6: ldr r3, [r7, #4]
    0802b8f8: ldr r3, [r3, #0] Mentioning this
    0802b8fa: ldr r3, [r3, #0]
    0802b8fc: str.w r3, [r7, #224] @ 0xe0



    I can't understand why it doesn't print the command in the 

    modem_send_command_wait_parse_result

    The debug mode does not showing anything wrong like the command not normally passed inside this function, I can see it but I didn't see it in the logs. Furthermore, I can't understand why it stack and left my MCU idle it does not doing anything stack there for hours.

    When I hit run code it continue correctly. I saw that in low level function modem_send_command_wait_parse_result() the receive to idle fails in the first try and went to enable it in the second try and stopped there. But when I hit the play button it continue.