Skip to main content
Visitor II
October 21, 2021
Question

STM32H7 Hal Ethernet Use of LWIP_MEMPOOL_DECLARE

  • October 21, 2021
  • 6 replies
  • 8964 views

Not sure if this is a bug or something I do not completely understand, but in the generated ethernetif.c file (HAL V1.9.0) there is the line:

LWIP_MEMPOOL_DECLARE(RX_POOL, 10, sizeof(struct pbuf_custom), "Zero-copy RX PBUF pool");

Which always uses, if I understand correctly, a fixed magic number of 10 for the pbuf-count. I would think that number should be tied to something like PBUF_POOL_SIZE. Or am I missing the point of this?

    This topic has been closed for replies.

    6 replies

    Graduate II
    October 21, 2021

    The RX_POOL pool is used for allocating a pbuf_custom structures to couple those with the received data buffers from DMA. Therefore the number of the pbuf_custom structures should be exactly the same as the number of the DMA receive data buffers. There is no reason why this number should be anything else than the ETH_RX_DESC_CNT.

    If the RX_POOL pool is larger than the DMA buffers (as in a default configuration), it will just waste some memory, but, if it is smaller, then it will limit the performance by making the ETH_RX_DESC_CNT > 10 configuration useless and waste significantly more memory than in the first case.

    @Imen DAHMEN​, this is another bug for the ridiculously crippled H7 lwIP driver...

    Explorer
    October 21, 2021

    I've a product with many connections of different service qualities (throughput, burstiness, etc), that are handled by tasks of different priorities. An Ethernet driver that dimensions rx buffers equal to rx descriptors and binds them is vulnerable to head-of-line blocking and is unsuitable in mine and other network products.

    Graduate II
    January 15, 2022

    We are talking about a zero-copy driver here, in which the data buffers are passed to the processing tasks, which release those after processing. Let's suppose the different tasks have not processed their data and released the data buffers yet, and we have run out of free data buffers. What's the point in having an additional free DMA descriptors or pbuf_custom structures if we have no more free data buffers to attach to those?

    Technical Moderator
    October 28, 2021

    Hello @alister​ and @Piranha​ ,

    Thanks for pointing out this issue.

    I raised this internally to the appropriate team to take the necessary action.

    Internal ticket ID 116209 is already created to solve this issue (PS: This is an internal tracking number and is not accessible or usable by customers).

    Thank you for your contribution.

    Imen

    ST Employee
    December 30, 2022

    Hi,

    As an answer for the main issue : LWIP_MEMPOOL_DECLARE with a fixed pbuf-count :

    Due to the fact that we are using a Ring structure for descriptors, each descriptor is pointed to 2 data buffers.

    So that the RX_POOL can hold all Rx_buffers, pbuf-count should be equal to 2 * ETH_RX_DESC_CNT.

    An Internal ticket ID 142051 is already created to solve this issue

    (PS: This is an internal tracking number and is not accessible or usable by customers).

    Sorry for the delayed answer and thank you for your interaction.

    Regards

    Mahdy

    Graduate II
    December 31, 2022

    And again a complete nonsense by an ST employee... First, the fact, that the descriptors are arranged as a ring, has nothing to do with the possibility to provide two data buffers for each descriptor. But the main nonsense you said is that the each descriptor points to two data buffers, while actually the HAL does not use buffer 2 of the Rx descriptors.

    Explorer
    November 28, 2023

    Hi together,
    my understanding of the current HAL F7 driver state (STM32F7xx-hal-driver v1.3.0) is:

    1. The DMA descriptors are used in chained mode and use one buffer per descriptor
    2. Each Rx buffer is assigned via to a descriptor entry in ETH_UpdateDescriptor
      • This happens for all available descriptors in HAL_ETH_Start or HAL_ETH_Start_IT
      • Or in HAL_ETH_ReadData when we have an unused descriptor
    3. Received data blocks (inside a RX Buffer) do get a cache invalidate and chained together in HAL_ETH_RxLinkCallback (as long as we use a buffer size of 1536) which is called in HAL_ETH_ReadData
      1. As a descriptor is free now, ETH_UpdateDescriptor is called which allocates another RX buffer
    4. Once all DMA receive chunks are received HAL_ETH_ReadData will return the chained pbufs
    5. This pbuf (chain) is then passed to the lwip tcpip task via netif->input(p, netif)
    6. Once the pbuf is not needed anymore by tcpip task it is freed and pbuf_free_custom is called and the RX buffer can be used again
    7. As long as the tcpip task is processing / holding the pbuf it can't be reused for receiving new data

     

    As to my understanding the following rules apply to the RX buffer count:

    1. We should have at least as much RX buffers as descriptor entries (ETH_RX_BUFFER_CNT >= ETH_RX_DESC_CNT)
    2. It's ok to have more RX buffers than descriptors to handle receiving of high load peaks
      1. This does imply that either the receive task has i higher priority than the tcpip task or the tcpip task does hold the receive buffer for longer (e.g. waiting for a timeout before using the buffer to send data back to the network. I'm not too deep into the lwip stack, but read that it uses receive buffer for sending out responses)
      2. One can now discuss if it's really usefull to receive more frames when the tcpip task is not able process them, but in my opinion this helps with peak load situations or multiple lwip "clients" (task that receives data e.g. via the socket API) is blocked and other tasks and internal things like DHCP should still be working reliable
      3. Maybe ST did some testing and observed that they get into RX_ALLOC_ERROR state in high peak load situations on some platform and increased them to 15.
      4. I couldn't see any issues with iperf and ETH_RX_BUFFER_CNT = ETH_RX_DESC_CNT. iperf will not hold received data back to send it later as it's designed to get the most throughput

     

    Note: numbered list is just for better referencing and not the exact execution order

    @PiranhaFirst of all thank you for all your hints and lists about various issues. Can you please confirm my assumptions / breakdown and extend/comment if you find something wrong / extendable etc.?

    @MSG_ST Do you think that it is possible that ST answers / adapt following points?

    1. It would be really nice if someone from ST can explain why the code uses a ETH_RX_BUFFER_CNT of 15U and please then also document it in the code.
      Also adapting the @note sections in the file header would be very helpful, as currently it's very confusing as it doesn't match the code.
    2. Please let the user overwrite the ETH_RX_BUFFER_CNT somehow (at least by adding a #ifndef block around the define in the ethernetif.c)

     

    Regards

    Philipp

    Graduate II
    November 30, 2023

     

    Your understanding is correct and there are just few things to add.

    The PBUFs can be held not only by the TCPIP thread, but also by applications. And it's not just a question of performance, as it can be done even deliberately as a feature. For example, an application can implement some buffering by holding PBUFs and only releasing them sometime later. The order of releasing PBUFs also can and will differ from the reception order - both from a single application and definitely between a several applications. All of this together means that the number of Rx buffers must be high enough that it more or less satisfies the requirements of all applications, including their buffering requirements.

    Now, let's analyse how the number of DMA descriptors relates to the number of Rx buffers.

     

    ETH_RX_BUFFER_CNT < ETH_RX_DESC_CNT

    In this configuration there are not enough buffers to attach to all descriptors and therefore the surplus descriptors are just wasted. The system can still work, but it just doesn't make sense.

     

    ETH_RX_BUFFER_CNT > ETH_RX_DESC_CNT

    In this configuration we have surplus buffers, which can be held by applications and then returned back to the Rx pool. But a buffer, which is sitting in an Rx pool and are not used for anything, is again wasted.

     

    ETH_RX_BUFFER_CNT == ETH_RX_DESC_CNT

    In this configuration, when a buffer is released, it doesn't even have to be put in an Rx pool, but can immediately be attached to a descriptor. Therefore the software managed Rx pool becomes useless, because it would always be empty, and it can and should be removed from the design completely.

     

    I've tried to explain this several times previously, but everyone misunderstood it, including the advanced users. For example, ST's code defaults to 4 descriptors and 12 buffers. When I am saying that those numbers must be equal, everyone interprets it as "reduce the number of buffers to 4" and counter-argues how limited such configuration would be. But for some strange reason nobody interprets it as "increase the number of descriptors to 12", which is what I was actually saying. I will remind again that the descriptors are small and their size, compared to the Rx buffers, is minor. Also Alister worried about a head-of-line blocking, but that is not the case here, because the buffers are not bound to particular descriptors and I'm not suggesting they should be.

    @alister , @LCE , this could be interesting for you.

    ST Employee
    November 29, 2023

    Hi @PhilippH ,

     

    Thank you for your detailed comment. I'm fully agree with your assumptions.

    1-

    I confirm from our side that the Rx_Buff size should be at least equal to the number of descriptors and the maximum of Rx_Buff size could exceed the number of descriptors in order to have a pool size which could handle high load peaks in some heavy applications. The comment section will be updated accordingly.

    And that's why you can find ETH_RX_BUFFER_CNT = 15U or 12U...

    2-

    The ETH_RX_BUFFER_CNT will be configurable via STM32CubeMX to allow user to modify it according to their use cases.

     

    Thank you again for your contribution

    Regards

    Graduate II
    November 30, 2023

     

    ETH_RX_BUFFER_CNT > ETH_RX_DESC_CNT

    > In this configuration we have surplus buffers, which can be held by applications and then returned back to the Rx
    > pool. But a buffer, which is sitting in an Rx pool and are not used for anything, is again wasted.

    I found that this is the configuration which works best for my application: no OS, kind of audio streaming, http rest interface, calling a RX-descriptor / pbuf reallocation function in main / while / LwIpProcess (when not sleeping).

    So I simply define:

     

    #define ETH_RX_DESC_CNT ( PBUF_POOL_SIZE - 8 ) /* must be lower than pool size */

     

     with 64 pbufs in pool, 32 for my test boards without external RAM (where I need more internal RAM for the more important TX buffers, RX buffers are mostly used to handle TCP ACKs).

    But don't ask me what the problem was, my H7 ethernet driver is more than a year old now, and I'm good at forgetting, maybe pbuf chains still being worked on. Have to check my notes...