Skip to main content
Visitor II
September 26, 2019
Solved

Ethernet performance & packets loss

  • September 26, 2019
  • 6 replies
  • 5981 views

Hello,

We try to test our Ethernet performance (STM32H7).

We are using a udp echo with LWIP , and we inject udp packets from PC into STM32H7 .

The performance is OK with most packets, but when decreasing packet size below ~300bytes, we start to get packet lost.

My question is:

  1. Is there a benchmark document for STM32H7 showing the expected performance for Ethernet ?
  2. Is there a loopback application example using HAL APIs (without LWIP) ?

Thank you,

Ran

    This topic has been closed for replies.
    Best answer by Ozone

    If you step back and widen your scope, you realize the same kind of question is raised on many MCU fora.

    Hardly any MCU of this (Cortex M) class can keep up with real-world ethernet traffic, except in sterilized test environments.

    MCUs must be <relatively> cheap to be successful. Ethernet OTOH requires a lot of offline buffer and core performance.

    Just check the ethernet peripherals of Cortex A devices, or chips used for PCs. They use to store multiple packages, usually DMA-ed into the core memory space.

    Most MCUs use to implement just enough to get it going. But after all, IoT is a marketing fad.

    6 replies

    Super User
    September 26, 2019

    Have you seen this thread?

    Super User
    September 26, 2019

    The better way is to give up on Lwip and stm32’s hal ethernet libraries, both of which are maddeningly difficult to understand/debug, and use a wiznet chip instead. But if you’re stuck with those, good luck.

    Graduate II
    November 28, 2019

    STM32 HAL is buggy bloatware, but lwIP is pretty solid and stable. Look at my other post and demonstration firmware. WIZnet can't come even close to that performance, functionality and flexibility:

    https://www.pjrc.com/arduino-ethernet-library-2-0-0/

    November 28, 2019

    Do you sell the code, or what is your idea behind not releasing it?

    Visitor II
    November 28, 2019

    Hi ranran, I'm facing same issue... Have you solved? How?

    Thanks

    ranranAuthor
    Visitor II
    November 28, 2019

    Hi,

    I've been struglling with it for a long time actually.

    We are using ethernet test equipment (so are you?) and we observe the same packet loss with EVAL board too !

    I think that the chip has some HW limitation (but I won't get confirmation for this from ST...), it just can't handle ieee 802.3 standard packets ! That's quite bad.

    Thanks for any idea

    OzoneAnswer
    Explorer
    November 28, 2019

    If you step back and widen your scope, you realize the same kind of question is raised on many MCU fora.

    Hardly any MCU of this (Cortex M) class can keep up with real-world ethernet traffic, except in sterilized test environments.

    MCUs must be <relatively> cheap to be successful. Ethernet OTOH requires a lot of offline buffer and core performance.

    Just check the ethernet peripherals of Cortex A devices, or chips used for PCs. They use to store multiple packages, usually DMA-ed into the core memory space.

    Most MCUs use to implement just enough to get it going. But after all, IoT is a marketing fad.

    Graduate II
    November 28, 2019

    Dear, @ranran​ , @TDK​ , @Ozone​ and everyone else interested in Ethernet!

    Don't take it personal, but the idea, that Cortex-M class MCUs can't handle Ethernet and IP stack decently, is total nonsense, which unfortunately has proliferated STM32 and other communities. Typical Cortex-M3/M4 is more powerful than Intel 80486, which was able to handle Windows 95, and Cortex-M7 is on par with Pentium II. On a RAM side few tens of KB for Ethernet and IP stack buffers are more than enough for high-performance implementation and modern MCUs have much more internal RAM, not even talking about possibility of adding external one.

    There are two main reasons leading to that false belief:

    1. ST's drivers and lwIP integration in examples and CubeMX generated code. Everyone is just trying to use those, failing and blaming hardware, but nobody is reading reference manual and looking at the code. But that code is full of bugs, bloated and is inflexible because of idiotic architecture. In another words - total crap! And that is very polite way of describing it!
    2. Long ago outdated understanding of Ethernet traffic distribution coming from the age of network hubs. Nowadays all Ethernet traffic is distributed by network switches, which learn MAC addresses and deliver frames only to the target device, not every device. Broadcast frames are exception, but in a normal real-world network those are transmitted from few times per minute to few times per second. Processing those is nothing for a Cortex-M class MCU.

    Therefore, when the software is done right, Ethernet and lwIP on STM32 MCUs works spectacularly well! And to prove it to others, I've made a demonstration firmware:

    https://community.st.com/s/question/0D50X0000AhNBoWSQW/actually-working-stm32-ethernet-and-lwip-demonstration-firmware

    ranranAuthor
    Visitor II
    November 29, 2019

    Hello Piranha (Community Member),

    Thank you for sharing the demonstration.

    I see that you tested it with iperf from another PC host.

    We have tested performance using Ethernet test equipment.

    1. I am not sure that PC iperf can reach ieee 802.3 required performance. The test equipment generate packets and interval between packets exactly according to ieee 802.3 requirement (the interval timing is accurate and we can not change it even if we want to).
    2. Another things to note, is what is the packet length in your tests ? Did you try packets below ~150 bytes ?

    Thank you!

    ranran

    Graduate II
    December 4, 2019

    > But can it handle the average 1000MBit/s office network ? Usually they are already drown by the irrelevant background chatter on link layer level.

    It seems that this is rather popular fundamental misconception of Ethernet networking. I've seen this misconception in posts from @Ozone​ , @Community member​ and other users. In my previous post I already told about "switch vs hub" question, but let's expand on this. First, look at this image.

    Long ago networked devices were connected with network hubs and had to work all synchronously in the same speed and mode (full or half duplex). As hub is a simple repeater, all connected devices in a single collision domain shared total throughput and every frame was sent to every device. But that was 20+ years ago, when 10 Mbps networks were the most popular ones! In 21st century hubs are obsolete and are not used or produced anymore.

    Nowadays all Ethernet networks are connected with network switches, which are totally different "store and forward" type devices. Switch has an internal RAM, where it captures incoming frame, processes MAC addresses and transmits the frame on corresponding port. To do this, the switch also has an internal RAM for MAC address table, where it stores learned addresses of network devices. That's why even simple switches have such specifications as MAC address table size, maximum frame size and frame forwarding rate. Because of the "store and forward" technique, the switch can connect devices with different speeds and modes. And, because of MAC address learning, the switch almost doesn't "spam" the devices with irrelevant traffic.

    Therefore typical office network is not 1000 Mbps, nor 100 Mbps or any other particular speed but a mix of different speeds, modes and physical interfaces, including optical and wireless. Every link between two devices has it's own speed and mode. If a desktop PC uploads a large file to some NAS (network-attached storage) through 1 Gbps interface at almost full speed and the same PC through the same network interface streams live 128 kbps MP3 stream to a 100 Mbps capable MCU based device connected to that same switch, then the MCU device needs to be able to handle only that MP3 stream. It doesn't even need to be able to handle full 100 Mbps, but only the data rate needed for it's actual task and a bit more. Even more - MCU device will not even be aware or able to detect the fact that the other PC-NAS traffic exists.

    The only exception or "background" traffic that is delivered to all devices in a network, is broadcast (targeted to all devices) frames. These are required mostly for discovery tasks of DHCP, ARP, DNS and some other protocols. As stated before, these range from few frames per minute to few frames per second. To contrast this, my STM32F7 based demonstration firmware is capable of handling 8000+ unicast (targeted to the particular device) TCP frames per second with a 33% CPU load. Broadcast frames are almost always UDP and UDP processing takes approximately twice less CPU. Dropping irrelevant frames early in IP stack takes even less CPU. Therefore broadcast or "background" traffic really is nothing for a Cortex-M class devices.