Skip to main content
Graduate II
November 30, 2022
Question

STM32F7: ETH TCP checksum offload fails

  • November 30, 2022
  • 22 replies
  • 5956 views

Hello,

ST's and all Ethernet experts, please.

STM32F767, 

Nucleo-144, and custom board

no OS 

lwIP 2.1.3, IPv4 only

STM32CubeIDE

no ETH interrupts used

Application

Industrial frontend, 

streaming "audio" data from SAIs via ethernet,

at high data rates, for long periods of time (weeks)

TCP is a must, losing packets is not an option.

Audio streaming mostly uses UDP, and they use interpolation

to mask lost packets - we are not allowed do that.

Problem:

ETH transmission: TCP header checksum is = ZERO = 0.

Depending on settings below (SRAM usage, CPU clock),

this happens after a few MB, or many GB of data, 

at 25.6 Mbps it's running sometimes for hours, 

sometimes it stops after a few minutes.

IP4 header checksum is okay (at least not 0).

All checked with Wireshark.

Then the PC side stops ACKnowledging, 

then lwIP shuts down TCP.

Checked:

  • Same behaviour on Nucleo and custom board. 
  • Checked on 2 different PCs.
  • LwIP stats don't show any errors.
  • "Transmit Store and Forward" is set in DMAOMR.
  • Transmit FIFO is deep enough for packets (1514 B, no bigger packets).
  • Checksum offload (CIC = ETH_DMATXDESC_CIC_TCPUDPICMP_FULL) is activated in all TX descriptor Status registers (BTW, there's a documentation error in RM0410 which says CIC bits are 28..27 in DESC1, page 1785).
  • Header checksums from lwIP are definitely = 0 before given to ETH DMA.
  • Payload checksum error status bit in the Transmit Status vector is NEVER set.
  • Memory barriers are used as recommended (DMB, DSB).
  • RM says reasons for checksum failure might be:
    • no end of frame written to FIFO
    • incorrect length
    • All this does not happen, I checked all descriptors.

Findings:

Settings so far with an impact on the "zero-checksum-failure":

  • CPU clock -> lower = better
  • usage of internal SRAM memory areas, use of DTCM / SRAM1

Best "setup" until now:

  • CPU clock reduced to 192 MHz (216 is max for F767)
  • no use of DTCM - which makes it lose 1/4 of internal SRAM

-> Much better, but still not perfect, failed on one board after 1 hour or so.

Having used FPGAs for years, I had the hope of leaving these

"assumed race conditions" behind (naive me...). At least in the 

FPGA you can get control of these problems.

For the final product, right now it seems the STM32F7 is not an option.

Which is sad, after having spent a lot of time on that one, having the 

firmware at about 99% finished.

So, what am I doing wrong?

Or is there a known issue?

Source code of ethernetif.c etc. attached.

P.S.: I spammed the code with lots of __DSB()... I think I can remove many of these. But as it seems to be memory related, I had some hope.

    This topic has been closed for replies.

    22 replies

    LCEAuthor
    Graduate II
    December 3, 2022

    Just found out:

    TX FIFO flushing restores correct checksum.

    Still, after a while checksum = 0.

    EDIT:

    But at least I can get everything else back to work, like the http interface.

    So when the streaming PCB gets the connection aborted error, I immediately flush the TX FIFO and checksums with everything else is good again.

    Except for the data stream...

    And I know a little bit more:

    • so the data and the given length to the FIFO somehow do not correspond, but I wonder why the descriptor's error bit is not set
    • I can't remember any register giving any info about the FIFO status, so I have to read through the register descriptions again.

    LCEAuthor
    Graduate II
    December 5, 2022

    The problem still there...

    I have thrown out all HAL setups and inits, went through all MAC & DMA registers, maybe it's a little bit better now. Sometimes it's running for hours, sometimes checksum = 0 after a few seconds.

    On my private Laptop, which is using a USB/ ethernet bridge with a Microchip LAN9512 (100M), and which has only "normal" anti-virus programs running, and which is also newer and has more CPU power, the checksum error happens less often.

    On my work laptop, which is running (feels like) 100 anti-virus-programs in the background, the problem occurrs more often. So I have checked the Intel I219LM ethernet adapter settings and played with these (incl. TCP checksum offload), then things only got worse.

    But I can NOT see any external reasons in the network stream that might provoke the checksum error - apart from the fact that this wouldn't make sense because it's definitely the STM32 taking care of the TCP checksum, and the PHY or anything of the outside world is not involved.

    RM0410, page 1786, about TX checksums:

    "The result of this operation is indicated by the payload checksum error status bit in the Transmit Status vector (bit 12). The payload checksum error status bit is set when either of the following is detected:

    – the frame has been forwarded to the MAC transmitter in Store-and-forward mode without the end of frame being written to the FIFO

    – the packet ends before the number of bytes indicated by the payload length field in the IP header is received. 

    When the packet is longer than the indicated payload length, the bytes are ignored as stuff bytes, and no error is reported. 

    When the first type of error is detected, the TCP, UDP or ICMP header is not modified. 

    For the second error type, still, the calculated checksum is inserted into the corresponding header field." - quote end

    As no error bit is ever set, it could only be the above in bold payload length problem. But I check the descriptors and all length info is as it should be.

    Very frustrating...

    Graduate II
    December 6, 2022

    > You can try broadcasting some simple constant UDP packet and capture it with Wireshark. Then compare the before and after versions and check whether the checksum field is the only one that differs or some other bytes also differ.

    Do this and report which bytes exactly differ.

    > Why's "Newlib's" printf so bad?

    Because it is written and optimized for speed on PCs with a relatively huge virtual memory. That implementation should not be used in a MCU environment. You can read about it's problems in this, this and this topic. And there is a good discussion about the topic on EEVblog forum. A decent solutions are: eyalroz/printf, LwPRINTF, nanoprintf.

    LCEAuthor
    Graduate II
    December 6, 2022

    @Piranha​  thanks again for having a look at this.

    And indeed, I oversaw that also the IP4 header checksum fails, otherwise the UDP packets before and after failure are identical (except for IP4 ID, but that must be).

    So I go check if IP4 header checksum is always set to 0 before given to the MAC.

    UDP echo test 1
     
    "0000"
     
    before failure:
    0000 d4 81 d7 86 d8 45 ce 22 29 43 04 20 08 00 45 00 
    0010 00 20 00 7e 00 00 ff 11 d5 8f c0 a8 b2 46 c0 a8 
    0020 b2 27 00 07 ed 91 00 0c cc 1d 30 30 30 30 00 00 
    0030 00 00 00 00 00 00 00 00 00 00 00 00 
     
    after failure:
    0000 d4 81 d7 86 d8 45 ce 22 29 43 04 20 08 00 45 00 
    0010 00 20 45 2d 00 00 ff 11 00 00 c0 a8 b2 46 c0 a8 
    0020 b2 27 00 07 ed 91 00 0c 00 00 30 30 30 30 00 00 
    0030 00 00 00 00 00 00 00 00 00 00 00 00 
     
    => IP4 and UDP checksums = 0
    different IP4 ID (okay)
    otherwise identical
     
    --------------------------------------------------------
    UDP echo test 2
     
    "01234567890123456789"
     
    before failure:
    0000 d4 81 d7 86 d8 45 ce 22 29 43 04 20 08 00 45 00 
    0010 00 30 00 81 00 00 ff 11 d5 7c c0 a8 b2 46 c0 a8 
    0020 b2 27 00 07 ed 91 00 1c 22 4a 30 31 32 33 34 35 
    0030 36 37 38 39 30 31 32 33 34 35 36 37 38 39 
     
    after failure:
    0000 d4 81 d7 86 d8 45 ce 22 29 43 04 20 08 00 45 00 
    0010 00 30 45 2c 00 00 ff 11 00 00 c0 a8 b2 46 c0 a8 
    0020 b2 27 00 07 ed 91 00 1c 00 00 30 31 32 33 34 35 
    0030 36 37 38 39 30 31 32 33 34 35 36 37 38 39 
     
    => IP4 and UDP checksums = 0
    different IP4 ID (okay)
    otherwise identical
     

    printf:

    I'll have a look at this, as I really need this for the Http interface.

    But for now with the standard printf:

    As long as it is not called while streaming, it can't be the reason for that failure, right?

    Also, if heap is big enough, this also should not be an issue?

    general: (mostly thinking loudly)

    that flushing the TX FIFO cures that problem (checksums are okay again), should hint at some problem concerning the packet length, the size the FIFO is told vs real size.

    With lwIP's TCP each packet comes with (at least) 2 chained packet buffers (pbufs) which are built in tcp_write():

    • the header: pbuf->payload = IP + TCP header
    • the data: pbuf->payload = "user data"

    These are put on the TCP PCB's unsent queue, which is again "emptied" by tcp_output() with calling tcp_output_segment(), which "finalizes" the header and calls the hardware output function.

    There the descriptors are set, concerning the checksum offload most importantly the First / Last segment bits and the length info.

    In store-and-forward mode the FIFO should only be given to the MAC when the frame is complete = Last segment bit set, as long as the packet is smaller than FIFO size.

    Which should be the case, with packets of max 1514 bytes vs 2kB FIFO.

    LCEAuthor
    Graduate II
    December 6, 2022

    IPv4 checksum:

    now, that's interesting:

    the 1st TCP packet that fails the TCP checksum = 0, still has a valid IPv4 checksum.

    After this packet also the IPv4 checksum fails with 0.

    Does that tell us anything except that I have some TX FIFO / size / length / whatever problem?

    This thing drives me crazy...

    Super User
    December 6, 2022

    > the 1st TCP packet that fails the TCP checksum = 0, still has a valid IPv4 checksum.

    Is there anything particularly interesting/special/unusual in that packet? Can you perhaps post it?

    JW

    LCEAuthor
    Graduate II
    December 6, 2022

    Right now it's stable as hell, even when I spam it with extra http requests like crazy...

    I "only" reduced TCP_MSS from 1460 (maximum) to 1220 and accordingly the SAI buffer size.

    I build that back to 1460 and show the latest packets.

    LCEAuthor
    Graduate II
    December 6, 2022

    @Community member​ Thanks again for having another look at it.

    Here are the 2 last good packets (Wireshark's "header analysis" for better readability), then the 1st bad packet (TCP cs = 0), and the 2nd bad packet (IPv4 cs = 0).

    I have checked the payload, that's okay, starts with a 20 byte header incl. timestamp, packet number, then comes audio data.

    Summary:

    All is well and as it should be. Except for that "§%!* checksums. ;)

    2nd last good packet:
     
    Frame 40177: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
    Ethernet II, Src: ce:22:29:43:04:20 (ce:22:29:43:04:20), Dst: Dell_Laptop
    Internet Protocol Version 4, Src: 192.168.178.70, Dst: 192.168.178.39
     0100 .... = Version: 4
     .... 0101 = Header Length: 20 bytes (5)
     Differentiated Services Field: 0xb8 (DSCP: EF PHB, ECN: Not-ECT)
     Total Length: 1500
     Identification: 0x68d7 (26839)
     000. .... = Flags: 0x0
     ...0 0000 0000 0000 = Fragment Offset: 0
     Time to Live: 255
     Protocol: TCP (6)
     Header Checksum: 0x66cd [correct]
     [Header checksum status: Good]
     [Calculated Checksum: 0x66cd]
     Source Address: 192.168.178.70
     Destination Address: 192.168.178.39
    Transmission Control Protocol, Src Port: 9603, Dst Port: 52127, Seq: 38951341, Ack: 1, Len: 1460
     Source Port: 9603
     Destination Port: 52127
     [Stream index: 4]
     [Conversation completeness: Incomplete (12)]
     [TCP Segment Len: 1460]
     Sequence Number: 38951341 (relative sequence number)
     Sequence Number (raw): 38957876
     [Next Sequence Number: 38952801 (relative sequence number)]
     Acknowledgment Number: 1 (relative ack number)
     Acknowledgment number (raw): 285078846
     0101 .... = Header Length: 20 bytes (5)
     Flags: 0x018 (PSH, ACK)
     Window: 5840
     [Calculated window size: 5840]
     [Window size scaling factor: -1 (unknown)]
     Checksum: 0x4506 [correct]
     [Checksum Status: Good]
     [Calculated Checksum: 0x4506]
     Urgent Pointer: 0
     [Timestamps]
     [SEQ/ACK analysis]
     TCP payload (1460 bytes)
    Data (1460 bytes) looks good (easily identified by own header before audio data)
    _______________________________________________________________________________
     
    last good packet:
     
    Frame 40178: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits) 
    Ethernet II, Src: ce:22:29:43:04:20 (ce:22:29:43:04:20), Dst: Dell_Laptop
    Internet Protocol Version 4, Src: 192.168.178.70, Dst: 192.168.178.39
     0100 .... = Version: 4
     .... 0101 = Header Length: 20 bytes (5)
     Differentiated Services Field: 0xb8 (DSCP: EF PHB, ECN: Not-ECT)
     Total Length: 1500
     Identification: 0x68d8 (26840)
     000. .... = Flags: 0x0
     ...0 0000 0000 0000 = Fragment Offset: 0
     Time to Live: 255
     Protocol: TCP (6)
     Header Checksum: 0x66cc [correct]
     [Header checksum status: Good]
     [Calculated Checksum: 0x66cc]
     Source Address: 192.168.178.70
     Destination Address: 192.168.178.39
    Transmission Control Protocol, Src Port: 9603, Dst Port: 52127, Seq: 38952801, Ack: 1, Len: 1460
     Source Port: 9603
     Destination Port: 52127
     [Stream index: 4]
     [Conversation completeness: Incomplete (12)]
     [TCP Segment Len: 1460]
     Sequence Number: 38952801 (relative sequence number)
     Sequence Number (raw): 38959336
     [Next Sequence Number: 38954261 (relative sequence number)]
     Acknowledgment Number: 1 (relative ack number)
     Acknowledgment number (raw): 285078846
     0101 .... = Header Length: 20 bytes (5)
     Flags: 0x018 (PSH, ACK)
     Window: 5840
     [Calculated window size: 5840]
     [Window size scaling factor: -1 (unknown)]
     Checksum: 0x838e [correct]
     [Checksum Status: Good]
     [Calculated Checksum: 0x838e]
     Urgent Pointer: 0
     [Timestamps]
     [SEQ/ACK analysis]
     TCP payload (1460 bytes)
    Data (1460 bytes) looks good (easily identified by own header before audio data)
    _______________________________________________________________________________
     
    1st bad packet: TCP checksum = 0
     
    Frame 40179: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits) 
    Ethernet II, Src: ce:22:29:43:04:20 (ce:22:29:43:04:20), Dst: Dell_Laptop
    Internet Protocol Version 4, Src: 192.168.178.70, Dst: 192.168.178.39
     0100 .... = Version: 4
     .... 0101 = Header Length: 20 bytes (5)
     Differentiated Services Field: 0xb8 (DSCP: EF PHB, ECN: Not-ECT)
     Total Length: 1500
     Identification: 0x68d9 (26841)
     000. .... = Flags: 0x0
     ...0 0000 0000 0000 = Fragment Offset: 0
     Time to Live: 255
     Protocol: TCP (6)
     Header Checksum: 0x66cb [correct]
     [Header checksum status: Good]
     [Calculated Checksum: 0x66cb]
     Source Address: 192.168.178.70
     Destination Address: 192.168.178.39
    Transmission Control Protocol, Src Port: 9603, Dst Port: 52127, Seq: 38954261, Ack: 1, Len: 1460
     Source Port: 9603
     Destination Port: 52127
     [Stream index: 4]
     [Conversation completeness: Incomplete (12)]
     [TCP Segment Len: 1460]
     Sequence Number: 38954261 (relative sequence number)
     Sequence Number (raw): 38960796
     [Next Sequence Number: 38955721 (relative sequence number)]
     Acknowledgment Number: 1 (relative ack number)
     Acknowledgment number (raw): 285078846
     0101 .... = Header Length: 20 bytes (5)
     Flags: 0x018 (PSH, ACK)
     Window: 5840
     [Calculated window size: 5840]
     [Window size scaling factor: -1 (unknown)]
     Checksum: 0x0000 incorrect, should be 0x9619(maybe caused by "TCP checksum offload"?)
     [Checksum Status: Bad]
     [Calculated Checksum: 0x9619]
     Urgent Pointer: 0
     [Timestamps]
     [SEQ/ACK analysis]
     TCP payload (1460 bytes)
    Data (1460 bytes) looks good (easily identified by own header before audio data)
    _______________________________________________________________________________
     
    2nd bad packet: TCP and IPv4 checksum = 0
     
    Frame 40180: 1514 bytes on wire (12112 bits), 1514 bytes captured (12112 bits)
    Ethernet II, Src: ce:22:29:43:04:20 (ce:22:29:43:04:20), Dst: Dell_Laptop
    Internet Protocol Version 4, Src: 192.168.178.70, Dst: 192.168.178.39
     0100 .... = Version: 4
     .... 0101 = Header Length: 20 bytes (5)
     Differentiated Services Field: 0xb8 (DSCP: EF PHB, ECN: Not-ECT)
     Total Length: 1500
     Identification: 0x68da (26842)
     000. .... = Flags: 0x0
     ...0 0000 0000 0000 = Fragment Offset: 0
     Time to Live: 255
     Protocol: TCP (6)
     Header Checksum: 0x0000 incorrect, should be 0x66ca(may be caused by "IP checksum offload"?)
     [Header checksum status: Bad]
     [Calculated Checksum: 0x66ca]
     Source Address: 192.168.178.70
     Destination Address: 192.168.178.39
    Transmission Control Protocol, Src Port: 9603, Dst Port: 52127, Seq: 38955721, Ack: 1, Len: 1460
     Source Port: 9603
     Destination Port: 52127
     [Stream index: 4]
     [Conversation completeness: Incomplete (12)]
     [TCP Segment Len: 1460]
     Sequence Number: 38955721 (relative sequence number)
     Sequence Number (raw): 38962256
     [Next Sequence Number: 38957181 (relative sequence number)]
     Acknowledgment Number: 1 (relative ack number)
     Acknowledgment number (raw): 285078846
     0101 .... = Header Length: 20 bytes (5)
     Flags: 0x018 (PSH, ACK)
     Window: 5840
     [Calculated window size: 5840]
     [Window size scaling factor: -1 (unknown)]
     Checksum: 0x0000 incorrect, should be 0x1801(maybe caused by "TCP checksum offload"?)
     [Checksum Status: Bad]
     [Calculated Checksum: 0x1801]
     Urgent Pointer: 0
     [Timestamps]
     [SEQ/ACK analysis]
     TCP payload (1460 bytes)
    Data (1460 bytes) looks good (easily identified by own header before audio data)

    LCEAuthor
    Graduate II
    December 6, 2022

    What I don't get:

    It's been perfectly stable now with reduced TCP_MSS (and reduced SAI buffer size).

    Super User
    December 6, 2022

    I don't see anything suspicious in those packets, and I have no more ideas.

    > It's been perfectly stable now with reduced TCP_MSS (and reduced SAI buffer size).

    That of course may or may not be coincidental...

    JW

    LCEAuthor
    Graduate II
    December 6, 2022

    @Community member​  Thanks again!

    > That of course may or may not be coincidental...

    That's the problem!

    I'm going through the lwIP output functions again, where TCP_MSS might have an impact.

    LCEAuthor
    Graduate II
    December 6, 2022

    Same error, also with reduced MSS, but it ran much longer considering the circumstances (lots of parallel http and network traffic).

    I checked all interrupts again, and I actually found the IGMP timer which might have called the output function, changed to a flag to call igmp_tmr() in main, didn't help.

    But now I'm quite sure there are:

    • no calls to lwIP functions from interrupts
    • no (s)printf when data streaming
    • interrupts are disabled while the TX descriptors are prepared, until after writing to the poll register DMATPDR to resume transmission

    So far it happens less when - but I'm not so sure...:

    • DTCM is not used
    • CPU clock is below max 216 MHz
    • TCP_MSS < max 1460
    • more parallel http access (no interrupts used)
    • more network traffic on PC side

    There must be something stupid somewhere...

    Again, current ethernetif.c & co attached.

    LCEAuthor
    Graduate II
    December 8, 2022

    I think I found it....

    After turning off all the stuff not needed for TCP transfers, including the source SAIs and sending only static buffers, nothing changed, still zero checksum.

    Then I went a few weeks back when I did not yet have good software for testing on the PC side, at that time I built a packet queue between SAI DMA buffers and tcp_write().

    And there's some "hole" which somehow might access the buffers given to ETH DMA.

    Anyway, without the queue, transfers are whacky, but never fail due to checksum.

    Although this needs some more testing, didn't have enough time this morning and had to get on the road - company's christmas party...

    If that's it I'll come back, report, and mark topic as solved.

    @Piranha​  & @Community member​ big thanks again for your help!

    LCEAuthor
    Graduate II
    December 14, 2022

    Again, conicidental...

    I still have the checksum problem.

    And I changed a lot over the last few days.

    Let me start with a confession, and I really feel stupid about how I approached the STM32 in the first place:

    I have worked with FPGAs and "slow" 8 -bit controllers for years, together with some specialized interface ICs (e.g., high-speed USB).

    So any time-critical stuff happened in the FPGA and between that and the "interface IC".

    When working with FPGAs in VHDL, I basically had control over each bit, at each clock edge.

    Being fed up with too many ICs (FPGA, SRAM, uC, IF-IC, ...) and too expensive FPGAs, we decided to try an all-in-one solution, and because I had some previous (good) experience with the STM32, we chose that one.

    So far, so good.

    Problem: I somehow approached the STM32 as if it was kinda magic device, throw in some register values, then let it run...

    Which means I did not worry about things like SRAM and DMA usage - where's what, how is it connected.

    AND even worse, being used to the slow non-OS 8-bitters, I just threw everything in the main loop as I was used to. With the result that some things were constantly checked at full clock speed - absolutely not necessary, and surely blocking the internal busses.

    An example:

    For the SAI data buffers, which should go zero-copy to ethernet, I had some control variables "in front" of the data, which I checked very often. These being in the same RAM area, I definitely took some time from the DMAs working on the data buffers.

    So again I changed a lot:

    • internal SRAM:
      • DTCM:
        • SAI buffer control variables
        • ethernet descriptors
        • ALL other variables
      • SRAM1:
        • SAI buffers that go zero copy to ethernet, 256 x 1460 bytes, 99% of the 368kB
      • SRAM2:
        • only some no-init stuff used at hard fault, basically wasting 16 kB
    • main loop:
      • go through some functions concerning SAI to ETH data transfer and PTP
      • then alternating between "low priority" stuff
      • then go to SLEEP, woken by any interrupt, but the DMAs should have a little more time in the background now
    • internal ADC:
      • is used for monitoring supply voltages
      • it had a MHz sampling clock... not anymore

    So, did that help? No. :(

    I still get zero checksums, which is at least "fixable" by flushing the TX FIFO (but that takes looong! Why?).

    What I found so far, these things are making things worse -> higher chances for the checksum failures:

    • lwIP's http webserver with SSI tags is ... terrible:
      • it's working, but it sends each tag response as a single pbuf to ETH, so there can be like 12 pbufs, some with only a few bytes, but FIRST and LAST segments look to be set correctly
      • the more that is used in parallel with data streaming
    • when the PC side takes too long to ACK / get the data (starting the compiler is a good way of occupying the PC)

    Could there be any kind of race condition between DTCM / SRAM1 / CPU / DMA2 / ETH-DMA, on the interface busses ?

    (Again: I have lots of memory barriers in place, esp. before/after the using the descriptors' OWN bit)

    When the SAI DMA complete interrupt occurrs, is it really done writing the buffers from the SAI FIFOs into internal SRAM?

    What could "confuse" the ETH TX FIFO when filling it with many small buffers, like it's done with http / SSI ?