Skip to main content
Visitor II
December 3, 2025
Question

FD-CAN - Protocol Error - ACK Error

  • December 3, 2025
  • 8 replies
  • 414 views

Hello,

We are facing an intermittent CAN-FD issue in the field and would appreciate guidance from the community.

Our system has two devices on the bus (no other device on the bus) using a request–response architecture. The master sends a request every 30 ms and the slave responds. This setup is deployed in hundreds of units running continuously (24 hours). Out of these, around 3 to 5 units per day show acknowledgement errors, which we track through the CAN protocol error counters.

The behaviour is unusual:

• The issue appears randomly on any unit.
• No bus-off condition is ever reported.
• Despite no bus-off, communication between the two nodes temporarily stops.
• Communication recovers automatically after a few seconds without any intervention.

Initially, we suspected a physical wiring problem. We re-checked all connectors and even secured them with glue. The bus has 120-ohm termination at both ends. However, the issue still appears randomly.

Below are the system details:

Microcontroller: STM32G0B1CBT6
Baudrate: 125 kbps
CAN bus length: ~100 cm
Termination: 120 ohms at both ends
FD-CAN Core Clock: 50 MHz
ClockDivider: 1
Bitrate Switching: Disabled
AutoRetransmission: Disabled
TransmitPause: Disabled
ProtocolException: Disabled

Nominal Bit Timing:
• Prescaler = 10
• SyncJumpWidth = 8
• TimeSeg1 = 31
• TimeSeg2 = 8

Data Bit Timing (BRS disabled, same as nominal):
• Prescaler = 10
• SyncJumpWidth = 8
• TimeSeg1 = 31
• TimeSeg2 = 8

Filters:
• StdFiltersNbr = 1
• ExtFiltersNbr = 0


Troubleshooting Steps Already Performed

  1. Physical wiring check
    • Verified connector seating and cable condition
    • Applied glue to prevent vibration-related disconnection
    • Confirmed correct 120-ohm termination at both ends

  2. Error counter monitoring
    • ACK errors observed in protocol error counters
    • No error-warning, error-passive, or bus-off states reported

  3. Timing verification
    • Checked nominal bit timing settings
    • Ensured both nodes use identical configurations
    • Bitrate switching is disabled on both sides

  4. Bus recovery logic
    • Bus-off recovery is implemented
    • Never triggered during these events

  5. Environmental factors
    • Units run 24×7
    • Errors occur randomly across different devices and locations


CAN Topology Summary

Master Device
↕ (approx. 100 cm cable)
Slave Device

Termination resistors (120 ohms) are present at both ends. No other nodes are connected.


Any insights or suggestions on what could cause intermittent ACK errors without bus-off would be greatly appreciated.

 

    This topic has been closed for replies.

    8 replies

    Technical Moderator
    December 3, 2025

    Hello,

    What is the source clock of FDCAN? a crystal or an internal RC clock source? so you should definitely use an external crystal.

    Please also refer to FDCAN knowledge bases related subjects:

    STM32 FDCAN running at 8 Mb/s on NUCLEO boards

    How to use FDCAN to create a simple communication with a basic filter

    FAQ: Fixing STM32 FDCAN communication disruptions - APB bus, kernel, and time quanta clocks

    Explorer
    December 3, 2025

    > Our system has two devices on the bus (no other device on the bus) using a request–response architecture.

    I don't think this is an appropriate description for a CAN connection.
    The (standardized) CAN frame contains an ACK slot at the end, and every node receiving the message without error acknowledges it.
    This is built into the CAN peripheral IP, ther is no core intervention required.

    • The issue appears randomly on any unit.

    This sounds like a noise / EMI issue. See below.

    • No bus-off condition is ever reported.
    • Despite no bus-off, communication between the two nodes temporarily stops.
    • Communication recovers automatically after a few seconds without any intervention.

    "Bus off" is only the last stage.
    Most probably the sending node goes into "error passive" mode, i.e.a "receive only" mode.
    The normal mode is re-enabled after a delay.
    The default error limit for the "error passive" mode is 128 (the default for "bus off" is 255)
    .

    I would recommend to check the respective error counters in your code.
    This are usually the byte fields "REC" and "TEC" in the CAN->ESR (error status register).

    To track the issue down, I would instrument the code to check for this condition, and toggle e.g. a GPIO if such an error occurs. This can be used as a scope / logic analyser trigger, to record the bus signals at that time.
    Of course a sufficient period leading up to the event needs to be recorded.

    As mentioned above, I suspect noise or EMI issues.
    Perhaps some high-energy switching event nearby, or EMI interference on the PCB.

    Graduate II
    December 3, 2025

    One more vote for either:

    - bad clock: either HSI / RC or crystal with too high tolerance

    - EMI: what's the environment ?

    Explorer
    December 3, 2025

    > One more vote for either:
     - bad clock: either HSI / RC or crystal with too high tolerance

    Yes, that could cause it as well, if the configuration "at the edge".
    Although CAN seems not very sensitive crystal variations. The ECUs of my company use mostly CAN, on heavy construction machinery. There are almost no complaints about CAN issues, despite some "average" priced crystals.

    By the way:
    For this kind of issue (timing) a logic analyser is fine.
    For EMI/noise issues, a proper scope is mandatory.

    Graduate II
    December 3, 2025

    Now I have seen that it's "only" running at 125 kbps and no BRS, that makes clock issues less probable.

    I'd also check all EMI precautions, from PCB to PCB, so layout, grounding, case connections, cable connections, cable shields, ...

    Technical Moderator
    December 3, 2025

    Many of our customers reported a CAN communications stops after a while when they use an RC based clock. This is either with CAN or FDCAN. So the first question that we ask: what clock source is used when an issue of CAN communication is reported. 99% of that kind of issues are coming from the usage of the internal RC. So we recommend them directly to use a crystal or crystal oscillator with a suitable PPM and I can confirm it solved their issue.

    TSola.1Author
    Visitor II
    December 5, 2025

    @mƎALLEm@Ozone and @LCE. Thank you for response.

    We are continuing to debug a CAN (Classic CAN, 125 kbps) reliability issue and would like further guidance from the community. Below is the detailed information requested earlier, along with our latest observations.

    1. Clock Source
    The nodes are running from the internal RC oscillator. Unfortunately, the hardware design does not include a crystal oscillator, so we cannot switch to an external clock source.

    2. Monitoring Error Counters
    Based on your suggestions, we will update the firmware to continuously log the CAN error counters (TEC/REC) and protocol error status. We will share logs once available.

    3. Nature of the Issue
    We have seen that in some units the issue appears only for a few milliseconds, whereas in others it persists for several minutes. The behaviour is intermittent and varies across units deployed in the field.

    4. Current Baudrate and Future Plan
    We have reduced the baud rate to 12.5 kbps for testing.

    In the next phase, we are planning to shift from continuous periodic communication (every 30 ms) to an event-based communication model. We would appreciate your thoughts on whether such a change would meaningfully improve robustness.

    Since we cannot add an external crystal, we would also like to know if there are alternative methods to mitigate RC-oscillator drift-related problems.

    5. Clock Calibration on STM32H743
    Our second CAN node uses an STM32H743. We were planning to evaluate FDCAN Clock Calibration with 125 kbps (as described in ST’s application note “Introduction to FDCAN peripherals for STM32 MCUs”).

    Introduction to FDCAN peripherals for STM32 MCUs - Application note

    Would using clock calibration on only one node help in any practical way?

    6. Node 1 Configuration (STM32G0B1CBT6)

    • Baudrate: 125 kbps

    • Bus length: ~1 meter

    • Termination: 120 Ω at both ends

    • FDCAN Core Clock: 50 MHz

    • Clock Divider: 1

    • BRS: Disabled

    • Auto-Retransmission: Disabled

    • Transmit Pause: Disabled

    • Protocol Exception: Disabled

    Nominal Bit Timing:

    • Prescaler = 10

    • SyncJumpWidth = 8

    • TimeSeg1 = 31

    • TimeSeg2 = 8

    Data Bit Timing:
    (Same as nominal since BRS is disabled)

    Filters:

    • Standard Filters: 1

    • Extended Filters: 0

    7. Node 2 Configuration (STM32H743)

    • Baudrate: 125 kbps

    • Bus length: ~1 meter

    • Termination: 120 Ω at both ends

    • FDCAN Core Clock: 100 MHz

    • Clock Divider: 1

    • BRS: Disabled

    • Auto-Retransmission: Disabled

    • Transmit Pause: Disabled

    • Protocol Exception: Disabled

    Nominal Bit Timing:

    • Prescaler = 20

    • SyncJumpWidth = 8

    • TimeSeg1 = 31

    • TimeSeg2 = 8

    Data Bit Timing:
    (Same as nominal)

    Filters:

    • Standard Filters: 1

    • Extended Filters: 0

    Any insights would be extremely helpful. Further suggestions for stabilizing communication without a crystal oscillator would also be appreciated.

    Technical Moderator
    December 5, 2025

    @TSola.1 wrote:

    1. Clock Source
    The nodes are running from the internal RC oscillator. Unfortunately, the hardware design does not include a crystal oscillator, so we cannot switch to an external clock source.


    Unfortunately, the crystal is something crucial for CAN communication.

    + Read also this article: CAN reception issues: Reasons and general troubleshooting

    Explorer
    December 5, 2025

    I overlooked that.

    The internal RC oscillator is not suited for most communication protocols. Especially when non-constant environmental temperatures are involved.

    Graduate II
    December 5, 2025

    Okay, "HSI", bad idea...

    Anyway, I would not give up due to the low bit rate.

    Your settings:

    Nominal Bit Timing:

    • Prescaler = 10

    • SyncJumpWidth = 8

    • TimeSeg1 = 31

    • TimeSeg2 = 8

    1) reduce the prescaler so there's more room for fine tuning the segments

    2) I think SyncJumpWidth must be smaller than TimeSeg2 (SJW is used to "lengthen" Tseg1 for sync purposes), the maximum should be SJW should be SJWmax = Tseg2 -1 

    3) ... but use the maximum SJW! 

    4) check your HSI frequency and check for the STM's HSI calibration features (if any - never used that, always using external clock)

    TSola.1Author
    Visitor II
    December 5, 2025

    Post edited by ST moderator to be inline with the community rules for the code sharing. In next time please use </> button to paste your code and a linker script content. Please read this post: How to insert source code.

    1. Clock Configuration of STM32G0B1CBT6 with baudrate 12.5 kbps:

     

    TSola1_0-1764937052275.png

    hfdcan1.Instance = FDCAN1;
    
    hfdcan1.Init.ClockDivider = FDCAN_CLOCK_DIV1;
    
    hfdcan1.Init.FrameFormat = FDCAN_FRAME_FD_NO_BRS;
    
    hfdcan1.Init.Mode = FDCAN_MODE_NORMAL;
    
    hfdcan1.Init.AutoRetransmission = DISABLE;
    
    hfdcan1.Init.TransmitPause = DISABLE;
    
    hfdcan1.Init.ProtocolException = DISABLE;
    
    hfdcan1.Init.NominalPrescaler = 10;
    
    hfdcan1.Init.NominalSyncJumpWidth = 2;
    
    hfdcan1.Init.NominalTimeSeg1 = 15;
    
    hfdcan1.Init.NominalTimeSeg2 = 5;
    
    hfdcan1.Init.DataPrescaler = 10;
    
    hfdcan1.Init.DataSyncJumpWidth = 2;
    
    hfdcan1.Init.DataTimeSeg1 = 15;
    
    hfdcan1.Init.DataTimeSeg2 = 5;
    
    hfdcan1.Init.StdFiltersNbr = 1;
    
    hfdcan1.Init.ExtFiltersNbr = 0;
    
    hfdcan1.Init.TxFifoQueueMode = FDCAN_TX_FIFO_OPERATION; 

    2. Curren Configuration of STM32H743 with baudrate 12.5 kbps:

    image (1).jpg

    image (2).jpg


    hfdcan1.Instance = FDCAN1;
    
     hfdcan1.Init.FrameFormat = FDCAN_FRAME_FD_NO_BRS;
    
     hfdcan1.Init.Mode = FDCAN_MODE_NORMAL;
    
     hfdcan1.Init.AutoRetransmission = DISABLE;
    
     hfdcan1.Init.TransmitPause = DISABLE;
    
     hfdcan1.Init.ProtocolException = DISABLE;
    
     hfdcan1.Init.NominalPrescaler = 20;
    
     hfdcan1.Init.NominalSyncJumpWidth = 2;
    
     hfdcan1.Init.NominalTimeSeg1 = 15;
    
     hfdcan1.Init.NominalTimeSeg2 = 5;
    
     hfdcan1.Init.DataPrescaler = 20;
    
     hfdcan1.Init.DataSyncJumpWidth = 2;
    
     hfdcan1.Init.DataTimeSeg1 = 15;
    
     hfdcan1.Init.DataTimeSeg2 = 5;
    
     hfdcan1.Init.MessageRAMOffset = 0;
    
     hfdcan1.Init.StdFiltersNbr = 1;
    
     hfdcan1.Init.ExtFiltersNbr = 0;
    
     hfdcan1.Init.RxFifo0ElmtsNbr = 1;
    
     hfdcan1.Init.RxFifo0ElmtSize = FDCAN_DATA_BYTES_12;
    
     hfdcan1.Init.RxFifo1ElmtsNbr = 1;
    
     hfdcan1.Init.RxFifo1ElmtSize = FDCAN_DATA_BYTES_12;
    
     hfdcan1.Init.RxBuffersNbr = 0;
    
     hfdcan1.Init.RxBufferSize = FDCAN_DATA_BYTES_12;
    
     hfdcan1.Init.TxEventsNbr = 1;
    
     hfdcan1.Init.TxBuffersNbr = 1;
    
     hfdcan1.Init.TxFifoQueueElmtsNbr = 1;
    
     hfdcan1.Init.TxFifoQueueMode = FDCAN_TX_FIFO_OPERATION;
    
     hfdcan1.Init.TxElmtSize = FDCAN_DATA_BYTES_8;

    Will be trying to reduce the prescaler and HSI calibration feature if any.

    Technical Moderator
    December 5, 2025

    Even though I still insist on the usage of a crystal, please post some photos of your CAN network set-up: the different nodes with the CAN bus (the wiring).