Skip to main content
Graduate
August 8, 2024
Question

Unstable SPI Slave

  • August 8, 2024
  • 9 replies
  • 3502 views

I have a custom board with a STMWB55 (master) and a STM32F031 (slave). They communicate over SPI using DMA. Unforturenately there are communication problems.

 

In order to identify the problem I have simplified the setup. I have used a ST NUCLEO-F031K6 and a ST NUCLEO-WB55 board to test the communication (SCLK, MISO & MOSI are used, 250kHz clk freq). Both use SPI example provided from ST (from STM32Cube_FW_F0_V1.11.5 and STM32Cube_FW_WB_V1.19.0)
The slave is setup to use interrupt instead of DMA to get the simplest setup. The only modification made from the original example is that the Master sends 8x bytes messages and the slave echoes what it receives - in order to test the slave device. Approximately 200 bits out of 10kB from the slave are errorprone. The communication is monitored using a logic analyser and the data is analysed with python.

 

Is this success/failure rate expected? Is this something that you have come across before? Do you have suggestions for solutions?

 

Here is an example of an error from the test:

Nikolaj_TL_0-1723122821437.png

(the slave actually echoes the message received +1, in order to easily compare the messages)

    This topic has been closed for replies.

    9 replies

    Super User
    August 8, 2024

    > Is this success/failure rate expected? Is this something that you have come across before? Do you have suggestions for solutions?

    You shouldn't be having random bits be incorrect. Noise can be a problem with long lead lines, if your clock rate is high. Using a CS pin is typically preferred, as it can allow for re-synchronization between master/slave.

    > (the slave actually echoes the message received +1, in order to easily compare the messages)

    But your output typically shows the same values (e.g.0x01/0x01), not X/X+1 (0x01/0x02)?

    Perhaps zoom in on a transaction that failed within the logic analyzer to see and show more of what's going on.

    Technical Moderator
    August 9, 2024

    Hello @Nikolaj_TL 

    The success/failure rate you are experiencing is not typical for reliable SPI communication. Please make sur you have stable connection between board and short wire. 

    Graduate
    August 27, 2024

    Hello @TDK

    I have added CS (NSS) and I still get the same amount of errors. I have zoomed in on a transaction that failed and see that the Slave TxBuffer is shifted oddly and that it also affect the following transmissions from the slave. 

    Nikolaj_TL_1-1724765836299.png

    In the test setup the master sends x1, x2, x3 ... the slave receives this and updates the TxBuffer: x1+1, x2+1, x3+1 ... Because the slaves respons is delayed by one the messages should always correspond to each other. 


    I have made sure that I have the best possible connection using good (and new) jumper wires and a simple setup

    Nikolaj_TL_2-1724766051350.png

    Do you have any suggestions as where to debug next?

     

     

    Super User
    August 27, 2024

    Probably just a software bug that you need to work through.

    The fact that the slave responds with consecutive duplicate bytes suggests maybe it's not fast enough to send data to the SPI before the master starts to read it, so it repeats the previous byte.

    Try slowing down the clock rate by a factor of 10. If errors disappear, probably the slave code is too slow.

    Graduate
    August 28, 2024

    Hello @TDK,

    I have already lowered the baudrate as much as possible. 

    Master: The master SYSCLK is 64MHz, the APB1CLK also 64MHz and the SPI_BAUDRATEPRESCALER_256 (default is 32 in the example). This results in a SPI clock frequence of 250kHz. 

    Slave: The slave runs at 48MHz and the APB1CLK also runs at 48MHz. 
    So from what I can tell the slave should have plenty of time to respond the master.


    I have setup a slave debug pin to toggle when the slave enters and exits the SPI interrupt callback (Slave_Debug_Pin), in order to further investigate the problem:

    Nikolaj_TL_1-1724826827119.png

     

    When an failure occurs, it is after a transfer where the interrupt callback is not called successfully:

    Nikolaj_TL_2-1724826944147.png

     

    The next transaction after the missing callback the interrupt callback is called in the middle of the transaction: 
    Nikolaj_TL_0-1724825845436.png
    The same applies to the next few transfers after which the slave comes back into sync again. 

     

     

     

    Super User
    August 28, 2024

    Looks like a byte is dropped or missed. At least, that would explain the behavior.

    In the previous transactions, are all 8 bytes present?

    TDK_0-1724853847500.png

    Consider sending 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 from the slave for every transaction in order to debug the problem better. You will be able to see exactly where things go wrong.

    Graduate
    August 29, 2024

    In the transaction you have marked all bytes are present:

     

    Nikolaj_TL_1-1724909793296.png

     

    Graduate
    August 29, 2024

    As suggested I have set the slave to send: 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08.

    Here are the results:

    Nikolaj_TL_1-1724911683099.png

    The pictures and the Logic Analyser file are attached.

    Her is a snippet of some of the transactions

    0.png    11.png   22.png 
    33.png   44.png   5Nikolaj_TL_2-1724912171324.png ...

    Super User
    August 29, 2024

    Okay, so there's a spurious 0x05 appearing between transactions, sometimes. This explains the behavior and why bytes are shifted.

    Let's assume the microcontroller is just doing what the code tells it to do, which is a very safe assumption. Where could this 0x05 be coming from? If I had to guess, probably it's the first bytes of the second word you're sending, which means the slave is underflowing. This explanation matches up with the spurious 0xFA in the opening post.

    The problem and solution here are likely going to be in the slave code. It looks like the 0x05 is not ready in time. But with 10ms per transaction, that is plenty of time. Can you share the slave code?

     

    If you have batches of 8 bytes, it might make more sense to keep CS low during the entire 8 bytes, rather than raising it on each individual byte.

    Graduate
    August 30, 2024

    I too am conviced that the problem lies in the slave code. There is definitely something about the first byte in the second word that might unveil the cause of the problem.

    Below I have shared some of the code, the full project is attached to the post. As mentioned before the project is based on the example project from ST only with minor changes:
    STM32Cube_FW_F0_V1.11.5\Projects\STM32F031K6-Nucleo\Examples\SPI\SPI_FullDuplex_ComIT

     

     

    int main(void)
    {
     HAL_Init();
     SystemClock_Config(); /* Configure the system clock to 48 MHz */
    
     /* Set the SPI parameters */
     SpiHandle.Instance = SPIx;
     SpiHandle.Init.Mode = SPI_MODE_SLAVE;
     SpiHandle.Init.Direction = SPI_DIRECTION_2LINES;
     SpiHandle.Init.CLKPhase = SPI_PHASE_1EDGE;
     SpiHandle.Init.CLKPolarity = SPI_POLARITY_LOW;
     SpiHandle.Init.DataSize = SPI_DATASIZE_8BIT;
     SpiHandle.Init.FirstBit = SPI_FIRSTBIT_MSB;
     SpiHandle.Init.TIMode = SPI_TIMODE_DISABLE;
     SpiHandle.Init.CRCCalculation = SPI_CRCCALCULATION_DISABLE;
     SpiHandle.Init.CRCPolynomial = 7;
     SpiHandle.Init.CRCLength = SPI_CRC_LENGTH_8BIT;
     SpiHandle.Init.NSS = SPI_NSS_HARD_INPUT;	//SPI_NSS_SOFT
     SpiHandle.Init.NSSPMode = SPI_NSS_PULSE_DISABLE;
    
     if(HAL_SPI_Init(&SpiHandle) != HAL_OK) {
     Error_Handler(); /* Initialization Error */
     }
    
     if(HAL_SPI_TransmitReceive_IT(&SpiHandle, (uint8_t*)aTxBuffer, (uint8_t *)aRxBuffer, BUFFERSIZE) != HAL_OK) {
     Error_Handler(); /* Transfer error in transmission process */
     }
    
     /* Infinite loop */
     while (1)
     { }
    }

     

     

    void HAL_SPI_TxRxCpltCallback(SPI_HandleTypeDef *hspi) {
    	HAL_GPIO_TogglePin(DEBUG_GPIO_PORT, DEBUG_PIN);	/* Debug Pin */
    
    	/* Clear Buffer */
    	memset(aTxBuffer, 0, BUFFERSIZE);
    
    	for(uint8_t i=0; i<BUFFERSIZE; i++) {
    		aTxBuffer[i] = i+1;
    	}
    
    	HAL_SPI_TransmitReceive_IT(&SpiHandle, (uint8_t*)aTxBuffer, (uint8_t *)aRxBuffer, BUFFERSIZE);
    	HAL_GPIO_TogglePin(DEBUG_GPIO_PORT, DEBUG_PIN); 	/* Debug Pin */
    }

     

     

     

    Super User
    August 30, 2024

    I took a look. Pretty simple, that should be working.

    Only thing I can think of is a signal integrity issue, but your scope plots looks good, and the speed is slow.

    Perhaps do something in HAL_SPI_ErrorCallback like toggle a pin so you know if it is detecting errors. Could be silently falling through.

    That's all I got. Let us know if you solve it.

    Graduate
    August 30, 2024

    I have set HAL_SPI_ErrorCallback to abort the transaction and signal on the Debug Pin. If it is hit no more transaction should be sent on MISO, as no new transfers are setup. 
    After testing I can tell that the ErrorCallback is not called.

     

    void HAL_SPI_ErrorCallback(SPI_HandleTypeDef *hspi)
    {
     HAL_SPI_Abort(&SpiHandle);
    
     HAL_GPIO_TogglePin(DEBUG_GPIO_PORT, DEBUG_PIN); 	/* Debug Pin */
     HAL_Delay(100);
     HAL_GPIO_TogglePin(DEBUG_GPIO_PORT, DEBUG_PIN); 	/* Debug Pin */
    }