Skip to main content
Piranha
Principal III
September 17, 2022
Question

[BUG] STM32 lwIP Ethernet driver Tx deadlock

  • September 17, 2022
  • 15 replies
  • 23671 views

This bug is present in ethernetif.c files for lwIP generated by CubeMX for the newer reworked ETH drivers.

Problem

When using CMSIS-RTOSv2, CubeMX generates the following code in the function low_level_output():

 

pbuf_ref(p);
HAL_ETH_Transmit_IT(&heth, &TxConfig);
while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)
{
}
HAL_ETH_ReleaseTxPacket(&heth);

 

The TxPktSemaphore is initialized with both maximum and (wrongly) initial count of 1. Because the semaphore count is already 1, the osSemaphoreAcquire() call returns immediately and the HAL_ETH_ReleaseTxPacket() can release the packet before the transmission has actually been completed. While the Tx complete interrupt happens after the osSemaphoreAcquire() call, the code continues running in this crippled mode, corrupting packets and driver/stack state.

When the Tx complete interrupt happens before the osSemaphoreAcquire() call, the osSemaphoreRelease() call in the Tx complete interrupt tries to increase the semaphore count, but fails to do it because the count already is at the maximum. At this moment the code actually corrects the wrongly initialized semaphore count and the osSemaphoreAcquire() call starts to actually wait for the Tx complete interrupt. That means the packets will not be corrupted further, but the driver/stack state can potentially already be damaged and is not guaranteed to recover.

After the semaphore count is corrected or if CMSIS-RTOSv1 is used, another issue can happen. The HAL_ETH_Transmit_IT() function can return an error if, for example, there are not enough descriptors available at the moment. In this case the transmission will not be started at all and the osSemaphoreAcquire() call will wait forever. This means that the function low_level_output() and therefore the whole lwIP core is deadlocked.

In addition, if the low_level_output() function is designed to block until the packet transmission is complete, there is absolutely no point in using the pbuf_ref() and the respective pbuf_free() in HAL_ETH_TxFreeCallback() function. Incrementing the PBUF reference count is necessary for a design, where the low_level_output() function doesn't block and the PBUF is released asynchronously.

The RxPktSemaphore has exactly the same wrong semaphore initial count, but, apart from a single additional useless loop iteration in the ethernetif_input() function, the bug has no critical consequences.

Solution

First, initialize the semaphores with a count of 0 in function low_level_init(). Therefore this CubeMX script in ethernetif_***.ftl files:

 

[#if cmsis_version = "v2"]
 RxPktSemaphore = osSemaphoreNew(1, 1, NULL);
 TxPktSemaphore = osSemaphoreNew(1, 1, NULL);
[#else][#-- else cmsis_version --]
 RxPktSemaphore = xSemaphoreCreateBinary();
 TxPktSemaphore = xSemaphoreCreateBinary();
[/#if][#-- endif cmsis_version --]

 

Must be changed to this:

 

[#if cmsis_version = "v1"]
 RxPktSemaphore = osSemaphoreCreate(osSemaphore(RxSem), 1);
 TxPktSemaphore = osSemaphoreCreate(osSemaphore(TxSem), 1);
 /* Decrease the semaphores' initial count from 1 to 0 */
 osSemaphoreWait(RxPktSemaphore, 0);
 osSemaphoreWait(TxPktSemaphore, 0);
[#else][#-- else cmsis_version --]
 RxPktSemaphore = osSemaphoreNew(1, 0, NULL);
 TxPktSemaphore = osSemaphoreNew(1, 0, NULL);
[/#if][#-- endif cmsis_version --]

 

In addition to fixing the bugs, I've made other improvements. The CMSIS-RTOSv1 code was using FreeRTOS API, which was functionally correct but inconsistent. The code examples are also using the same inconsistent mix of those two APIs. Also I swapped the order of [if-else] blocks for the CubeMX script to a more logical one and consistent with other parts of the script. I would recommend ST to keep the same order everywhere, which is not the case currently.

Second, the return value of the HAL_ETH_Transmit_IT() function must be checked and, if something other than HAL_OK was returned, the osSemaphoreAcquire() and HAL_ETH_ReleaseTxPacket() calls must be skipped and instead the pbuf_free() must be called to revert the increment done by the earlier pbuf_ref() call, if it is used.

 

pbuf_ref(p);
if (HAL_ETH_Transmit_IT(&heth, &TxConfig) == HAL_OK) {
	while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)
	{
	}
	HAL_ETH_ReleaseTxPacket(&heth);
} else {
	pbuf_free(p);
}

 

Additional information

Although with incomplete description of consequences and their severity, this bug and a correct solution has also been reported there:

https://github.com/STMicroelectronics/STM32CubeF4/issues/127

And the fact of a deadlock happening is also reported there:

https://github.com/STMicroelectronics/STM32CubeH7/issues/224

15 replies

PHolt.1
Senior
September 18, 2022

Why are the semaphores needed?

I have been told repeatedly that LWIP does not call TX in multiple threads, even if multiple FreeRTOS tasks are outputting data concurrently.

Piranha
PiranhaAuthor
Principal III
September 20, 2022

As those three lines of code shows, the TxPktSemaphore is used for waiting on hardware Tx completion, not some mutual exclusion. Of course, a decent design should just add the packet to the DMA Tx queue, return immediately and then release the sent packets on a Tx complete interrupt. But this bug report is for ST's code, not something decent!

By the way, when this bug will be fixed, most likely ST's stated 92 Mbps Tx throughput will decrease significantly.

PHolt.1
Senior
September 20, 2022

OK thanks; so this is not relevant to my code which was ST's (adapted) one from ~ 2 years ago.

jacekowski
Visitor II
September 26, 2022

There is still an issue with the semaphore that is not fixed by the proposed fix,

If interrupt happens between

hal_status = HAL_ETH_Transmit_IT(&heth, &TxConfig);

and

while(osSemaphoreAcquire(TxPktSemaphore, TIME_WAITING_FOR_INPUT)!=osOK)

Then there is a chance that HAL_ETH_TxCpltCallback will be called between those two and you will still have the code stopping at this line.

Piranha
PiranhaAuthor
Principal III
September 28, 2022

No, that is not how semaphores work. The ISR calls the osSemaphoreRelease(), which increases the semaphore count to 1. When the count is non-zero, the osSemaphoreAcquire() decreases the count and returns immediately with a return value of osOK. The order of those calls doesn't matter - the logic will still be correct. If that would not be the case, the semaphore would be useless. And this behavior is not specific to CMSIS-RTOS or FreeRTOS - that is the way semaphores work in general.

jacekowski
Visitor II
September 28, 2022

Issue in my case was due to eiuther sending things from multiple tasks or from within interrupt context (i didn't really check which of those two things was actually breaking it, just fixed both potential issues) with fairly high interrupt load, so in my case i would end up with 2x release first (because of semaphore max count being 1 it leaves semaphore at 1) then acquire that works, then 2nd acquire that locked.

ktrofimo
Senior III
October 4, 2022

0693W00000UnKuvQAF.png

 /* create a binary semaphore used for informing ethernetif of frame transmission */
 TxPktSemaphore = osSemaphoreNew(1, 0, NULL);
void HAL_ETH_RxCpltCallback(ETH_HandleTypeDef *handlerEth)
{
 osSemaphoreRelease(RxPktSemaphore);
}

Still locks :(

ktrofimo
Senior III
October 4, 2022

Update:

Both TCP/IP task and Eth input task both are blocked at the same event object 0x200090ac0693W00000UnKyYQAV.png

Piranha
PiranhaAuthor
Principal III
October 5, 2022

https://github.com/STMicroelectronics/STM32CubeH7/issues/224#issuecomment-1199200853

It seems that you are still using the ST's broken link detection code. I've pointed it out and explained what is wrong with it in my main Ethernet & lwIP issue topic since the beginning and I am repeating it again and again, but nobody is listening. ST, forum users and almost everyone just keeps silent and pretends that those issues doesn't exist. It's a mass insanity!

To fix it, you can get inspired from these links:

https://community.st.com/s/question/0D53W00000zHH9PSAW/changing-ip-address-mac-address-during-runtime

https://github.com/stm32-hotspot/STM32H7-LwIP-Examples/blob/b4bcc6cffa109ff44b875165b5cd5aec04f5df22/STM32H743_Nucleo_ETH/LWIP/Target/ethernetif.c#L798

P.S. I cannot know whether your other threads, which are using lwIP API, respect the rules.

ktrofimo
Senior III
October 6, 2022

Thank you for your efforts!

I think I did implemented all improvements you mentioned. My code was running for weeks under constant load (17Mb/s: vnc, http, https, vpn, modbus, ping). But I noticed that a simple code could break the whole thing.

For example simple netconn UDP client works like a charm:

struct netconn *sendconn = netconn_new( NETCONN_UDP );
netconn_bind(sendconn, &WG_LOCAL, 55155 );
netconn_connect(sendconn, &WG_SERVER, 55155);
struct netbuf* sendbuf = netbuf_new();
uint8_t *data = netbuf_alloc(sendbuf, strlen(message) );
memcpy(data, message, strlen(message) );
netconn_send(sendconn, sendbuf);
netbuf_free(sendbuf);
netbuf_delete(sendbuf);
netconn_disconnect(sendconn);
netconn_delete(sendconn);

But the same RAW client breaks the whole thing:

/* THIS CODE PRETENDS TO WORK, BUT CAUSES LWIP TO FAIL */
struct udp_pcb* my_udp = udp_new();
udp_connect(my_udp, &PC_IPADDR, 55155 );
struct pbuf* udp_buffer = pbuf_alloc(PBUF_TRANSPORT, strlen(message), PBUF_RAM);
if (udp_buffer != NULL)
{
	memcpy(udp_buffer->payload, message, strlen(message));
	udp_send(my_udp, udp_buffer);
	pbuf_free(udp_buffer);
}

 From "lwIP Multithreading" (https://www.nongnu.org/lwip/2_1_x/multithreading.html) :

"Netconn or Socket API functions are thread safe against the core thread but they are not reentrant at the control block granularity level. That is, a UDP or TCP control block must not be shared among multiple threads without proper locking."

Pavel A.
Super User
October 5, 2022

In the new ST "sandbox" there is yet another collection of ETH & LwIP examples with a nice readme.

Native CubeIDE projects, no more hassles with import and conversion.

https://github.com/stm32-hotspot/STM32H7-LwIP-Examples

It is not clear how to inform the maintainers of a bug. See CONTRIBUTING.md.

Piranha
PiranhaAuthor
Principal III
October 30, 2022
MSG_ST
ST Employee
January 10, 2023

Hi,

The proposed fix was tested and it negatively impacts the performance of the driver.

Without the solution proposed :

RxPktSemaphore = osSemaphoreNew(1, 1, NULL);

TxPktSemaphore = osSemaphoreNew(1, 1, NULL);

--> Client : 92.9 Mbits/s

--> Server : 95 Mbits/s 

With the solution proposed :

RxPktSemaphore = osSemaphoreNew(1, 0, NULL);

TxPktSemaphore = osSemaphoreNew(1, 0, NULL);

--> Client : 75.7 Mbits/s

--> Server : 94.9 Mbits/s  

So du to the fact that the performance is paramount and the problem reported is extremely rare, actually this fix will not be implemented.

But we will save it to be implemented on a global enhancement solution.

Thank you for your cooperation and for sharing the encountered issues.

Regards

Mahdy

jacekowski
Visitor II
January 10, 2023

I really hope this is a bad joke.

It is not extremely rare, it is extremely frequent.

PHolt.1
Senior
January 10, 2023

Exactly. The perf reduction is absolutely negligible. How many embedded systems are a) needing and b) actually achieving (via a real application and a real tcp/ip stack) 75mbps?

ktrofimo
Senior III
January 10, 2023

A​gree. I can sacrifice bandwidth but absolutely can't sacrifice stability.