Skip to main content
Graduate II
November 30, 2022
Question

STM32F7: ETH TCP checksum offload fails

  • November 30, 2022
  • 22 replies
  • 5956 views

Hello,

ST's and all Ethernet experts, please.

STM32F767, 

Nucleo-144, and custom board

no OS 

lwIP 2.1.3, IPv4 only

STM32CubeIDE

no ETH interrupts used

Application

Industrial frontend, 

streaming "audio" data from SAIs via ethernet,

at high data rates, for long periods of time (weeks)

TCP is a must, losing packets is not an option.

Audio streaming mostly uses UDP, and they use interpolation

to mask lost packets - we are not allowed do that.

Problem:

ETH transmission: TCP header checksum is = ZERO = 0.

Depending on settings below (SRAM usage, CPU clock),

this happens after a few MB, or many GB of data, 

at 25.6 Mbps it's running sometimes for hours, 

sometimes it stops after a few minutes.

IP4 header checksum is okay (at least not 0).

All checked with Wireshark.

Then the PC side stops ACKnowledging, 

then lwIP shuts down TCP.

Checked:

  • Same behaviour on Nucleo and custom board. 
  • Checked on 2 different PCs.
  • LwIP stats don't show any errors.
  • "Transmit Store and Forward" is set in DMAOMR.
  • Transmit FIFO is deep enough for packets (1514 B, no bigger packets).
  • Checksum offload (CIC = ETH_DMATXDESC_CIC_TCPUDPICMP_FULL) is activated in all TX descriptor Status registers (BTW, there's a documentation error in RM0410 which says CIC bits are 28..27 in DESC1, page 1785).
  • Header checksums from lwIP are definitely = 0 before given to ETH DMA.
  • Payload checksum error status bit in the Transmit Status vector is NEVER set.
  • Memory barriers are used as recommended (DMB, DSB).
  • RM says reasons for checksum failure might be:
    • no end of frame written to FIFO
    • incorrect length
    • All this does not happen, I checked all descriptors.

Findings:

Settings so far with an impact on the "zero-checksum-failure":

  • CPU clock -> lower = better
  • usage of internal SRAM memory areas, use of DTCM / SRAM1

Best "setup" until now:

  • CPU clock reduced to 192 MHz (216 is max for F767)
  • no use of DTCM - which makes it lose 1/4 of internal SRAM

-> Much better, but still not perfect, failed on one board after 1 hour or so.

Having used FPGAs for years, I had the hope of leaving these

"assumed race conditions" behind (naive me...). At least in the 

FPGA you can get control of these problems.

For the final product, right now it seems the STM32F7 is not an option.

Which is sad, after having spent a lot of time on that one, having the 

firmware at about 99% finished.

So, what am I doing wrong?

Or is there a known issue?

Source code of ethernetif.c etc. attached.

P.S.: I spammed the code with lots of __DSB()... I think I can remove many of these. But as it seems to be memory related, I had some hope.

    This topic has been closed for replies.

    22 replies

    Super User
    December 15, 2022

    I understand your frustration, but it's very unlikely you will be able to get help here, as we are mere users with inevitably limited experience and zero access to inside information.

    The Synopsys modules (both ETH and OTG_USB) used in STM32 are not only very complex, but also quirky and laden with historical layers. The documentation is lacking and it's only partially ST's fault, as they mostly copypaste what they purchased. To make things worse, the modules tend to change in time, i.e. they are around in various versions, and those have various quirks.

    Look for example at the Successive write operations to the same register might not be fully

    taken into account erratum in the STM32F407 errata...

    JW

    LCEAuthor
    Graduate II
    December 15, 2022

    Ah, that's where stuff like that in the HAL driver comes from:

    /* Wait until the write operation will be taken into account :
     at least four TX_CLK/RX_CLK clock cycles */
    tmpreg = (heth->Instance)->MACCR;
    HAL_Delay(ETH_REG_WRITE_DELAY);
    (heth->Instance)->MACCR = tmpreg;

    So maybe F4 & F7 ETH MAC are very much the same?

    Will check...

    Thanks again!

    LCEAuthor
    Graduate II
    December 15, 2022

    I changed register writing to this method, didn't change anything.

    BUT... as I use the 128 kB DTCM as "main" RAM, I changed some variables' alignment to "8", and so I did with the lwIP stuff.

    It's running very stable now for 3 hours, only when I access the webserver at the same time with a page using SSI tags, the checksum error comes quickly.

    Spamming the device with GET requests to non-SSI pages (either html from flash/RAM or JSON replies) are no problem at all.

    And in general, even on my work PC I have less buffer overflows.

    Okay, let's dig into lwIP's SSI stuff...