STM32G431 I2C with HAL gets stuck in the interrupt
I am trying to use a G431 as an I2C device (slave). Unfortunately it can be made to latch up rather easily, getting stuck with an endlessly triggering I2C interrupt. It appears to be a flaw in the HAL code where it doesn't clear the interrupt flags, possibly due to its internal state machine being out of sync with reality.
There are two situations that can cause this to happen.
1. The bus host (master) starts transactions back to back too quickly. Adding a 1ms delay between transactions fixes the problem.
2. DMA is used for reception, but the host stops sending data early. I.e. HAL_I2C_Slave_Seq_Receive_DMA() is called with a certain buffer size, but the number of bytes sent by the host is less than that size.
I thought that switching to interrupts instead of DMA might help, but it doesn't. It seems like the rate of interrupts triggering can cause the stuck-in-interrupt issue, but I have not deeply debugged it yet.
It seems like a reasonable implementation of the DMA mode would allow for recovery of this situation. I have tried adding code to HAL_I2C_ErrorCallback() that tries to reset both DMA and the I2C peripheral as follows:
HAL_I2C_DeInit(hi2c);
HAL_DMA_DeInit(&hdma_i2c2_rx);
HAL_DMA_DeInit(&hdma_i2c2_tx);
HAL_I2C_Init(hi2c);
HAL_I2C_EnableListen_IT(hi2c);
It does not fix the issue.
Also of note is that HAL_I2C_ErrorCallback() is called at the end of every transaction, even if it sends the expected number of bytes and completes normally. The error code is HAL_I2C_ERROR_AF, which is unclear. The datasheet does not mention it as a possible error condition, and it doesn't seem to align to any of the bits in the status register. It appears to simply trigger at the end of any TX transaction, presumably because the host did not ACK. It is unclear if anything needs to be done to handle it.
How do I make this reliable? My device accepts one byte written which is a register bank select, and then the host either continues to write up to the number of bytes in that bank, or issues a restart condition and reads up to the number of registers in that bank.
It needs to be robust and able to recover from errors. Thus far the only recovery I have found is to reset the MCU.
