Skip to main content
Explorer
February 11, 2020
Solved

[bug fixes] STM32H7 Ethernet

  • February 11, 2020
  • 34 replies
  • 44437 views

@Amel NASRI​, @ranran​, @Piranha​, @Harrold​, @Pavel A.​ 

V2 of my fixes and improvements to H7_FW V1.5.0/V1.6.0 Ethernet...

Changes include

  • Decoupling receive buffers from receive descriptors so buffers may be held any length time without choking receive.
  • Optionally queuing transmit, so transmit doesn't need to block until complete.
  • Many bug fixes.

Find full details and source in the attached zip. V1 was posted to another developer's question. Please post any questions about V2 here.

    This topic has been closed for replies.
    Best answer by Amel NASRI

    Dear All,

    Our Experts tried to answer almost all the limitations reported in this thread.

    Please refer to this post for more details.

    At this point, I suggest to close this discussion as it becomes difficult for us to follow it with the great number of comments.

    Don't hesitate to submit your new posts asking new questions.

    Thanks for all the ones involved to make ST solutions more efficient.

    -Amel

    34 replies

    alisterAuthor
    Explorer
    February 12, 2020

    Errata to V2.0 documentation:

    • “MEMP_NUM_PBUF should be 0�? is incorrect. It’s needed for tcp_write to use PBUF_ROM where data isn’t copied.
    • LOCK_TCPIP_CORE/UNLOCK_TCPIP_CORE in ethernetif_input is mentioned twice. The first mention is correct. It’s not necessary and should be removed because tcpip_inpkt manages threading by message queue.
    • On page 10 describing integrating the changes with your Cube project, should include: you must copy pertinent changes from my ethernetif.c to yours. Your ethernetif.c will be different because you'll have different phy, you may use different receive buffer sizes and counts, you may configure your MPU differently etc. As is, it won't even compile, as I've removed ethernet_link_thread and maybe other things, which you'll have to write yourself.
    Visitor II
    February 15, 2020

    hi , can you please do share the user code also eg: #include "eth.h" and its sources.

    alisterAuthor
    Explorer
    February 16, 2020

    > please do share the user code also eg: #include "eth.h" and its sources.

    No I'll not share my phy code. This is like the BSP code of one of ST's example projects. You have to write this part yourself. Most likely it is limited value to you anyway. I've provided all the source files that ST's Cube would provide.

    [edit] actually I've replaced MX_LWIP_Init and hadn't included that. FW_H7_V1.6.0 moves its call to MX_LWIP_Init to main, which is too early for my app, So in a USER CODE area above main() in main.c I've added “#define MX_LWIP_Init() /* Not used. */�? (without the quotes) and I've excluded LWIP\App\lwip.c from my builds and I've written my own. Other developers have posted Community already about mistakes in MX_LWIP_Init, e.g., incorrectly using netif_* where it should use netifapi_netif_*.

    Visitor II
    February 18, 2020

    hi @alister​  ,

    thank you.

    unfortunately the solution doesnot works and breaks at below.

     /* call user specified initialization function for netif */

     if (init(netif) != ERR_OK) {

      return NULL;

     }

    can somebody please share the references of MX_LWIP_Init issue?

    may thanks in advance

    Visitor II
    February 15, 2020

    How about putting it on Github so that it can be maintained?

    alisterAuthor
    Explorer
    February 16, 2020

    > How about putting it on Github so that it can be maintained?

    I don't have a development board. So no common platform. Everyone's ethernetif.c would be different.

    Finally, the ethernet bug-fixes task is finished where I work and I've other matters to attend.

    alisterAuthor
    Explorer
    February 18, 2020

    To some questions from https://community.st.com/s/question/0D70X000007QuPr/cant-receive-ethernet-packets-in-stm32h750vb...

    > tried to protect 32KB which covers all RX/TX buffers. It did not work.

    Not sure. I'll describe the memories a bit. Hopefully something here helps.

    Three memories areas need care:

    1. Rx buffers. DMA writes direct to memory. If it's cached (a) you'll need to invalidate the buffer using SCB_InvalidateDCache_by_Addr after the receive is finished and before you read the data and (b) because invalidate would destroy unsaved information, e.g. the lwIP pbuf info, it's important the buffer starts at the start of a cache line and ends at the end of a cache line. On the H7 cache lines are 32-bytes.
    2. Tx buffers. DMA reads direct from memory. If it's cached you'll have to save the buffer using SCB_CleanDCache_by_Addr before starting transmit. Buffer alignment doesn't matter because no information's lost.
    3. DMA descriptors. These need every access to be completed in program order. If you don't use the MPU, even if cache is disabled, you'll run into trouble unless you add memory barriers to the software.

    Ideally the rx and tx buffers would be in D2. This is what I've done:

    1. Rx buffers: My app needs a lot of rx buffers and D2's used for other stuff, so I've moved rx buffers to D1. Also as memory's scarce, and I know most the packets are small I've reduced the size of my rx buffers. I've used MPU to make my rx buffers not-cached. In testing I didn't see any change to performance between cached and uncached. Understanding what else my app needs to do and that it'd access the data only once I made the call to leave it not-cached.
    2. Tx buffers: lwIP allocates tx buffers from its heap. So what you do for tx buffers you're doing for any tables it allocates too. For performance we want the tx buffers in D2 (see AN4891 if this isn't clear). So I've put the entire heap in D2 and made it cached.

    For the descriptors, this is what I've done: I'd not added memory barriers. MPU was easier. When ST fix their driver they ought add the necessary memory barriers and make it a Cube configuration so it defines macros that'd compile-in the memory barriers only if MPU's not used because they'd waste cycles if they're not needed.

    This snippet from my map file shows the buffers and descriptors. The section names and miss-spellings were there before I arrived.

    .EthTxBlock 0x30040000 0x4e30
     0x30040000 . = ALIGN (0x4)
     0x30040000 __ETH_TX_START = .
     *(.TxArraySection)
     .TxArraySection
     0x30040000 0x4e30 Common\Eth\LWIP\Target\ethernetif.o
     0x30040000 lwip_custom_ram_heap
     0x30044e30 __ETH_TX_END = .
     
    .EthDescriptorsBlock
     0x30044e30 0x5d0
     0x30045000 . = ALIGN (0x400)
     *fill* 0x30044e30 0x1d0 
     0x30045000 __ETH_DESCRIPTORS_START = .
     *(.RxDecripSection)
     .RxDecripSection
     0x30045000 0x180 Common\Eth\LWIP\Target\ethernetif.o
     0x30045000 DMARxDscrTab
     *(.TxDecripSection)
     .TxDecripSection
     0x30045180 0x180 Common\Eth\LWIP\Target\ethernetif.o
     0x30045180 DMATxDscrTab
     0x30045400 . = ALIGN (0x400)
     *fill* 0x30045300 0x100 
     0x30045400 __ETH_DESCRIPTORS_END = .
     
    .EthRxBlock 0x24000000 0x20000
     0x24000000 . = ALIGN (0x20000)
     0x24000000 __ETH_RX_START = .
     *(.RxArraySection)
     .RxArraySection
     0x24000000 0x1fe00 Common\Eth\LWIP\Target\ethernetif.o
     0x24000000 EthIfRxBuff
     0x24020000 . = ALIGN (0x20000)
     *fill* 0x2401fe00 0x200 
     0x24020000 __ETH_RX_END = .

    __ETH_TX_START is the start of the lwIP heap. In the source code I've provided, this is cached.

    __ETH_DESCRIPTORS_START and section(".RxDecripSection") is the start of the descriptor memory. The section(".TxDecripSection") is a little higher in the memory. In the source code I've provided, this is an MPU region.

    __ETH_RX_START and section(".RxArraySection") is the start of my rx buffers. In the source code I've provided, this is not-cached.

    > you have used two regions starting from 0x30040000

    Check my map-file snippet and details above. Check https://sourceware.org/binutils/docs/ld/SECTIONS.html#SECTIONS.

    > where in stm32h7xx_hal_conf.h file I have to add those new defines

    You don't. This stm32h7xx_hal_conf.h exists for (a) fixing defines like ETH_TX_DESC_CNT that the current Cube doesn't define properly, and (b) adding defines like ETH_PMT_IT_ENABLED which I've added to ST's ethernet driver source code as an example how they might improve that driver's implementation.

    It's ethernetif.c job to integrate lwIP if you use it, your OS if you use one, the ethernet driver and your app.

    For my changed ethernetif.c, you need to:

    1. Change these macros to suit your needs:
      1. For receive: ETH_RX_BUFFER_SIZE, ETH_RX_BUFFER_COUNT, ETH_RX_BUFFERS_ARE_CACHED
      2. For transmit: ETH_TX_BUFFERS_ARE_CACHED, ETH_TX_QUEUE_ENABLE
    2. Place these definitions correctly: DMARxDscrTab, DMATxDscrTab and EthIfRxBuff. You do that in your linker script file or elsewhere in your IDE. I use GCC and haven't fixed or tested anything else. I see __ICCARM__ and __CC_ARM use dodgy addresses. That'd be a problem. If you use them you'll need to change those. I suggest find a better technique and notify ST how because it's better that their future spin of ethernetif.c needs no changes outside of USER CODE blocks.

    > I see that HArdFault_Handler() happens immediately

    Perhaps D2 needs its clock enabled. See AN4891. In rev 3 it's on page 19. It's discussed elsewhere on Community.

    > I could not find any definitions for this Callback function!.

    My changed stm32h7xx_hal_eth.h file needs to be placed in an include directory your compiler searches before the "Drivers/STM32H7xx_HAL_Driver/Inc" directory.

    > I am not going to use LWIP!

    Assuming you're not meaning you're using a different stack, you'll need some way to queue packets where a packet may be a chains of buffers. Even bare-metal you'd implement queues. Even if you processed received packets immediately and discarded you'd still need queues... because your list of unused buffers is a queue. LwIP does it with struct pbuf. You'd need it for rx and tx. So your struct would need members for at least buffer length and next. For RxPoolInit, link all your buffers onto your free list. For RxBufferAlloc, allocate one buffer from your free list. For RxBuffFree, return one buffer to your free list. For RxPktAssemble, chain buffers by their next. For RxPktDiscard, return each buffer of a chain to your free list.

    Graduate
    March 6, 2020

    Hello, I would like to add ethernet support to my TouchGFX application.

    I have got the lwIF directory from Cube and insert in TouchGFX Makefile with:

    components := TouchGFX/gui target TouchGFX/generated/gui_generated LwIP

    but I got many errors about include location, after many changes I have now this error if I use this define:

    IP4_ADDR(&netmask, 255,255,255,255);:

    Drivers/BSP/STM32746G-Discovery/stm32746g_discovery.c: In function 'udp_echo_init':

           LwIP/include/lwip/ip4_addr.h:120:44: error: 'struct ip_addr' has no member named 'addr'; did you mean 'u_addr'?

            #define IP4_ADDR(ipaddr, a,b,c,d) (ipaddr)->addr = PP_HTONL(LWIP_MAKEU32(a,b,c,d))

    alisterAuthor
    Explorer
    March 8, 2020

    netmask's type is ip4_addr_t?

    Visitor II
    November 26, 2021

    Hello Alister or ST colleagues. Can somebody tell if the current version of the STM32H7 ETH driver has fixed the issues of 2019 and 2020?. Thanks for your feedback

    alisterAuthor
    Explorer
    March 8, 2020

    Porting guide for Cube users. Supplements section 9 of my document...

    1. BEFORE porting my bug-fixes & improvements, integrate and test your PHY code with the stock Cube code. Test using ICMP (ping). The stock Cube code operates well enough for this. This is to avoid integrating everything at once and possibly finding you’re unsure what’s working. Ideally, you’d start using my bug-fixes & improvements after discovering the stock Cube is unreliable.
    2. Copy my ethernetif.c and stm32h7xx_hal_eth.c to a source directory of your application, i.e. away from Cube’s LWIP/Target or Drivers/STM32H7xx_HAL_Driver/Src directories. The purpose of this and following steps is to isolate changes to the Cube code so they won’t be trashed when you generate the Cube again later.
    3. Exclude Cube’s LWIP/Target/ethernetif.c and Drivers/STM32H7xx_HAL_Driver/Src/stm32h7xx_hal_eth.c from your build.
    4. Copy my stm32h7xx_hal_conf.h and stm32h7xx_hal_eth.h to a source directory of your application, away from ../Inc or ../Core/Inc, or ../Drivers/STM32H7xx_HAL_Driver/Inc. Configure your compiler’s include paths so that directory’s searched before Cube’s ../Inc or ../Core/Inc, or Drivers/STM32H7xx_HAL_Driver/Inc.
    5. My stm32h7xx_hal_conf.h contains macros that Cube doesn’t configure correctly, e.g. ETH_TX_DESC_CNT and ETH_RX_DESC_CNT, or doesn’t configure yet.
    6. Keep your lwipopts.h at LWIP/Target so Cube continues to control your lwIP configuration. Reconcile your Cube configuration with the parts of my lwipopts.h you want to keep. Keep all my changes between /* USER CODE BEGIN 1 */ and /* USER CODE END 1 */ is easiest. But drop the part commented "Relocate the lwIP heap..." if you don't want to do that, and if you don't do that, you'll have to remove the part with a similar comment in ethernetif.c.
    7. Configure an MPU region so accesses of the rx and tx DMA descriptors are completed in program order. This is mandatory. Use my mpu.c as a guide. You may configure regions to make your rx and tx buffers uncached too. Refer my discussion on the MPU/caching choices earlier in this community post. The MPU configuration should be performed before calling MX_LWIP_Init(). If you’re unsure, use my regions.
    8. Integrate your earlier LWIP/Target/ethernetif.c with my ethernetif.c (in the directory you’d moved it in step 1)…
      1. In the low_level_init function…
        1. Change MII to RMII if that's what your PHY uses.
        2. Remove the lines of code between /* USER CODE BEGIN low_level_init Code 1 for User BSP */ and /* USER CODE END low_level_init Code 1 for User BSP */. Those lines create my PHY task (which I haven’t shared). But Cube creates a PHY task named ethernet_link_thread from MX_LWIP_Init. I regret my PHY deviates from Cube. I suggest you follow Cube. And if Cube doesn’t work for you, tell ST.
        3. Modify or remove my calls of HAL_ETH_SetMACConfig and HAL_ETH_SetMACFilterConfig as appropriate for you.
      2. Copy in the ethernet_link_thread. That's the PHY task described a few steps earlier.
      3. Dimension your rx buffers: ETH_RX_BUFFER_SIZE, ETH_RX_BUFFER_COUNT.
      4. Check ETH_RX_BUFFERS_ARE_CACHED and ETH_TX_BUFFERS_ARE_CACHED are consistent with your MPU configuration.
    9. Integrate your linker script file...
      1. Position the sections for the DMA descriptors and the receive buffer area named in ethernetif.c. Use my mapping_eth.ld as a guide.
      2. Check your configured MCU region sizes match your section sizes. Check their start addresses are aligned their sizes.
    10. I’ve only read/tested GCC (__GNUC__). If you’re using a different compiler check the compiler-specific parts carefully.

    Visitor II
    March 24, 2020

    Thanks @alister​ for sharing your solution, although I'm really disappointed you had to do it in the first place. I ran into the 'overwriting buffer' issue 5 seconds into my first test.

    Just letting you know I'm using the socket interface (maintaining legacy code) and it's working well with your modifications. I haven't done a lot of testing, but so far it appears robust.

    Hi @Amel NASRI​ , do these defects need to be logged as issues in the GitHub repository, or have they been captured here?

    Cheers David

    Visitor II
    April 2, 2020

    Thanks @alister​  for your work. I would like to ask You about another aspect of H7 ethernet controller. H7 ETH controller support TCP hardwere segmentation. Do you know, how to modified LWIP to use this functionality. Maybe You heard about this implementation? In my project I require high data transfer over ethernet between STMH7 and NationalInstrument SbRIO controler. To achive maksimum bandwith I've used direct transfering UDP datagrams omiting LWIP (tx data throwoutput on the level of 98%). I've discovered that this solution leads to hi cpu consumption on SbRIO side (50% usage of cpu only for receiving datagrams). So to deal with this issue I've made code that supported datagram segmentation in IP level (one UDP datagram can transfer around 65kB of data in this configuration). This partially solve problem on SbRIO (Ethernet driver joins ethernet frames in Kernel layer what is much faster). I know that ARM SbRIO procesor was equipded in ethernet controller supporting tcp segmentation so the next logic step in development would be moving with comunication between devices to TCP and hardwer supported segmentation but the question is how to do it on H7 side?

    alisterAuthor
    Explorer
    April 2, 2020

    I can't help much.

    The H7 ETH driver described here does

    1. Tx IP header and payload checksum calculation and insertion (refer RM0433, TDESC3 CIC) .
    2. Appears to have coded support of TSO.

    You might check these:

    1. The STM32H7 reference manual, RM0433... https://www.st.com/content/ccc/resource/technical/document/reference_manual/group0/c9/a3/76/fa/55/46/45/fa/DM00314099/files/DM00314099.pdf/jcr:content/translations/en.DM00314099.pdf.
    2. A Google search of lwIP and TSO... https://pdfs.semanticscholar.org/8f5a/de59cea788141047f5ed19b99d62fc24d0cb.pdf

    Graduate
    April 2, 2020

    I have also used this code successfully. After a detailed review of the modified driver code, I was impressed with some of the subtle bugs and mistakes that alister found in the ST code.

    It will be interesting to see if ST ever take on board the suggestion of a zero copy driver that abstracts buffers away from Ethernet DMA descriptors. The complexity of all the buffer callbacks would seem inconsistent with their typical 'dumb' drivers, but hopefully they do as it is essential for a high performance Ethernet solution.

    Initially I tried to use the modified driver alongside the the other driver code so I could still cleanly regenerate STM32CubeMx code but because of changes in stm32h7xx_hal_eth.h, which is included in the auto generated files it was not practical. Replacement of the stm32h7xx_hal_eth files is necessary, then manual intervention each time STM32CubeMX code is generated after that. A small price to pay given that it actually works.

    This application uses a hardware abstraction layer and OS abstraction (as it runs on embedded hardware or a PC) so took a bit of tweaking. All the code that is required is supplied in the zip folder. On custom hardware, STM32H753 @ 480MHz, using Windows iperf 2 and lwiperf server, I was able to obtain a repeatable 50MBps. I did zero investigation into what was limiting the rate to 50 as that is already 50 times more than this application needs!

    I really wanted to find a bug or improvement in this code so I could submit some feedback but was unable to!

    Thanks alister

    alisterAuthor
    Explorer
    April 2, 2020

    Thanks for the kind thanks. If you put your modified stm32h7xx_hal_conf.h, stm32h7xx_hal_eth.h and stm32h7xx_hal_eth.c files in different directories and configure the compiler to search your includes directory before the HAL driver includes directory, and you exclude stm32h7xx_hal_eth.c in the HAL drivers source directory, you should be able to regenerate Cube without its trashing your changes. That's what I'd tried to explain in the earlier porting-guide post, or am I missing something?

    Graduate
    April 3, 2020

    Actually, since writing that post last night (and forgetting to click submit until this morning), I have managed to do as you said and have it build without modifying the STM32CubeMX generated files. The trick was to realising that STM32CubeMX will delete files that it thinks shouldn't be there, but not folders. Oh, and as you mentioned, making sure the include path to the modified files comes first!

    I never actually saw your post describing how to do it and wrote my own notes (attached) for future reference.

    Visitor II
    April 2, 2020

    I've been following the development and progress that you and others have been making on the H7 ethernet drivers since last November -- only now have I finally had the time to sit down and incorporate all your recommendations. Just wanted to comment to say I'm incredibly thankful for the hard work that you've done here and for generously sharing that with the community. Working with the template code provided was quick and painless to get working, but that was made easier from knowing where to look after spending a lot of time on previous attempts at fixing the ethernet stack last year.

    Now we just have to sit and wait, I'm hopeful that ST will incorporate your changes to their code and make life easier for others.