Sporadically I2C Errors (HAL/Communication Errors)
- November 16, 2022
- 3 replies
- 4781 views
Hallo in our company we use for a large Project a STM32 MP153a microprocessor. Currently I'm trying to write a Device Driver for the TDK ICM42688P Mems Gyro. We have connected the Gyro trough I2C with the Cortex M4 core within the MPU. Both devices are located on a custom PCB and the Gyro is routed to the I2C2 Interface of the MPU (I2C2_SCL = PZ6, I2C2_SDA = PZ7). The Gyro is the only Device on this Interface (I2C2). We use a I2C Voltage of 1.8V and both devices support these voltage. The I2C2 Interface is in the Polling Mode.
For my driver I have to read multiple registers of the Gyro therefore I use the HAL_I2C_MEM_READ function of the STM32 HAL which is in the HAL generated from CubeMX. I noticed that I get random Communication Errors so I have written a simple testcase in which I only read the Device ID of the Gyro in a while loop. Here is the pseudo code of this testcase so you have an idea how it works:
while(1)
{
retVal_e = icm42688pReadRegister(ICM42688P_BANK_0_WHO_AM_I, ®Value_u8);
if(retVal_e == Std_ReturnType_CommError)
{
TRACE_ERROR("This is a Communication Error");
errorCounter_u16++;
}
osDelay(10); //time in milliseconds
}
The retVal_e contains the status of the communication. If its all fine the loop waits for 10 ms and will do another read. If there is a error the error counter will be incremented and a print will appear on the console. In the first place the osDelay was set to 1 ms in this testcase I roughly get 200 errors in 30 s but our company has a zero error tolerance so this is inacceptable. After this I tested the other I2C devices on the PCB if there will be also errors. On the I2C1 Interface we have a MS5837 Pressure sensor and a RX8804 real time clock. I tested both devices in a quit similar way --> polling some data in a while loop. The communication with both devices was flawless even if I have no delay at all just the read function in the while loop. We also got an Ina231 power monitor on the I2C3 Interface. I tested this device too but this communication was again flawless even without a delay. For you to notice both I2C Interfaces (I2C1 and I2C3) are configured in Fast Mode with the Standard Timing (tr and tf = 0 ns) and the Analog Filter enabled while the digital Filter coefficient 0. Also both interfaces use a DMA communication for streams that are bigger than 10 Bytes. Therefore the testcases didn't used the DMA (only a single register read in the while loop). Back to the I2C2 Interface. I looked with a very fast scope on the bus and the positive voltage and saw that the pullup resistor was a bit to large because the clk high phases were nearly spikes so I changed the 4.7k Pullups to 1.8k Pullups. Thereafter the bus looked really nice and the serial encryption of the scope recognized the correct communication sequence (I attached a pic of the normal sequence (device address: 0x68, register address: 0x75, device ID: 0x47). I also get much less errors roughly 10 errors in 30 s (with 1 ms delay). I set a breakpoint to the faulty communication so I could observe the bus in the scope in case of a communication error. Then a communication error occurred the bus looks still very fine and no significant change to the previous sequences could be seen therefore I assume the fault is in the MPU or in the software and not in the gyro or on the bus.
On the search of a solution I tested also the Standard Mode instead of the Fast Mode. I usually get more Errors in the Standard Mode than in the Fast Mode so this is strange. I noticed if I disable the Analog Filter I'll get a bit less errors. When I enable the Analog Filter and set the digital Filter coefficient to 15 (max) then I will get more errors. I also measured the real rise and fall times on the bus (10% to 90% --> tr ~255 ns, tf ~ 25 ns) and used these values in the timing in CubeMX. A few test later I get the fewest errors if I choose the Fast Mode with the Analog Filter disabled and the digital Filter coefficient set to 0 and if I choose a rise time of 300 ns and a fall time of 30 ns. Even with these settings I roughly get 4 errors in 30 s (with 1 ms delay). Keep in Mind that the communication on the bus looks similar to the picture in the attachment and no error is visible on the bus.
Trough some Debugging I located the error at the waiting for some flags. Over a period of 10 minutes I could capture 3 timeouts for the RXNE flag in the HAL_I2C_Mem_Read() function. I also received 3 timeouts for the TXIS flag in the I2C_RequestMemoryRead() which is called in the HAL_I2C_Mem_Read() function. Also in the I2C_RequestMemoryRead() function I got 6 timeouts for the TC flag. Each time the communication on the bus looks good but the MPU didn't get the right sample.
I already looked up the errata sheet of the STM32 MP153A (ES0438 Rev 7). There is a case (2.19.1) of wrong data sampling if the data setup time is shorter than one I2C kernel clock period. As a workaround I increased the I2CClk Frequency from the previous 64 MHz (Clk src HSI) to 100.5 MHz (Clk src PCLK1) and changed the Timing respectively. This workaround did not help and the errors did not vanish. I even tried a much slower I2C Kernel Clk with 4 MHz (Clk src CSI) but this also did not work. In the next case (2.19.2) there are bus errors with the consequence that the BERR flag is set. I checked this flag but this flag is never set when I get an error. The next case (2.19.3) is not relevant because we don't have a multi master bus. Also the next case (2.19.4) is not relevant because we have clock stretching enabled. The last case (2.19.5) speaks from a stall after the transfer of the first byte if the ratio of the I2c APB Clock and the I2c Kernel Clock is between 1.5 and 3.0. The Workaround is the changing of this ratio to be higher than 3 or to be lower than 1.5. The APB Clk is set to 100.5 MHz and can not be changed due to the project requirements but I already increased the I2C Kernel Clk to 100.5 MHz and 4 MHz (see above) therefore the ratio is either 1.0 or 25.1 and also this workaround did not help.
I made the observation if I increase the delay in the while loop from 1 ms to 10 ms or even 100 ms I get no errors within 5 min. I even tested the communication with a delay from 10 ms within 30 min in two runs. In the first run I didn't get an error at all and in the second run I get a single error. For these tests I changed the I2C kernel clk back to the previously 64 MHz. This seems to be a workaround but we have a zero error tolerance and we need a communication rate of at least 200 Hz so the delay of 10 ms is already to long.
You see I already tested a lot of things and I have a urgent need for some help. I hope you have some things I can also test and maybe there is the right solution for my Problem.
