STM32F7: ETH TCP checksum offload fails
Hello,
ST's and all Ethernet experts, please.
STM32F767,
Nucleo-144, and custom board
no OS
lwIP 2.1.3, IPv4 only
STM32CubeIDE
no ETH interrupts used
Application:
Industrial frontend,
streaming "audio" data from SAIs via ethernet,
at high data rates, for long periods of time (weeks)
TCP is a must, losing packets is not an option.
Audio streaming mostly uses UDP, and they use interpolation
to mask lost packets - we are not allowed do that.
Problem:
ETH transmission: TCP header checksum is = ZERO = 0.
Depending on settings below (SRAM usage, CPU clock),
this happens after a few MB, or many GB of data,
at 25.6 Mbps it's running sometimes for hours,
sometimes it stops after a few minutes.
IP4 header checksum is okay (at least not 0).
All checked with Wireshark.
Then the PC side stops ACKnowledging,
then lwIP shuts down TCP.
Checked:
- Same behaviour on Nucleo and custom board.
- Checked on 2 different PCs.
- LwIP stats don't show any errors.
- "Transmit Store and Forward" is set in DMAOMR.
- Transmit FIFO is deep enough for packets (1514 B, no bigger packets).
- Checksum offload (CIC = ETH_DMATXDESC_CIC_TCPUDPICMP_FULL) is activated in all TX descriptor Status registers (BTW, there's a documentation error in RM0410 which says CIC bits are 28..27 in DESC1, page 1785).
- Header checksums from lwIP are definitely = 0 before given to ETH DMA.
- Payload checksum error status bit in the Transmit Status vector is NEVER set.
- Memory barriers are used as recommended (DMB, DSB).
- RM says reasons for checksum failure might be:
- no end of frame written to FIFO
- incorrect length
- All this does not happen, I checked all descriptors.
Findings:
Settings so far with an impact on the "zero-checksum-failure":
- CPU clock -> lower = better
- usage of internal SRAM memory areas, use of DTCM / SRAM1
Best "setup" until now:
- CPU clock reduced to 192 MHz (216 is max for F767)
- no use of DTCM - which makes it lose 1/4 of internal SRAM
-> Much better, but still not perfect, failed on one board after 1 hour or so.
Having used FPGAs for years, I had the hope of leaving these
"assumed race conditions" behind (naive me...). At least in the
FPGA you can get control of these problems.
For the final product, right now it seems the STM32F7 is not an option.
Which is sad, after having spent a lot of time on that one, having the
firmware at about 99% finished.
So, what am I doing wrong?
Or is there a known issue?
Source code of ethernetif.c etc. attached.
P.S.: I spammed the code with lots of __DSB()... I think I can remove many of these. But as it seems to be memory related, I had some hope.
