Skip to main content
Explorer II
February 13, 2024
Solved

stm32L4S5, spi pause between 8-bit frames

  • February 13, 2024
  • 15 replies
  • 5799 views

Hi all,

i am performing some 400bytes blocks transferts by SPI, at 16Mhz.

I have NSS off, so no CS up/down between frames.

From some testing, whatever system clock i set, or whatever spi clock i set, there is a fixed 4us time between 2 consecutive frames. This pause is of course very visible and heavy at 20Mhz spi clock, by scope something like:

__||||||||_____________||||||||______________|||||||_____

Transfer is done by LL_SPI_TransmitData8

If any help to reduce this delay between frames, would be very good, or at least to know i cannot reduce it in any way will be helpful too.

thanks a lot

 

 

    This topic has been closed for replies.
    Best answer by TDK

    Code is definitely impacting it. The hardware will only have pauses when the code can't keep up.

    I don't see how TI mode would help.

    15 replies

    Super User
    February 13, 2024

    Use DMA or improve the speed of your code. Show your code if you want tips on how to improve it.

    Compiling using Release settings will also speed things up a bit.

    heisenbugAuthor
    Explorer II
    February 13, 2024

    hi @TDK 

    thanks a lot.tw)

    Well, code is actually the one in zephyr os, but anyway, i changed mcu clock speed
    from 80 to 120Mhz, this delay is still exactly the same, 4us. So looks like the code is not impacting it.

    Sure, if you confirm i cannot move it down, my next step is using DMA.

    And, question, could the TI mode help here (i can control CS by software in case) ?

    TDKAnswer
    Super User
    February 13, 2024

    Code is definitely impacting it. The hardware will only have pauses when the code can't keep up.

    I don't see how TI mode would help.

    heisenbugAuthor
    Explorer II
    February 13, 2024

    @TDK , sorry, not really clear how this 4us can be exactly the same if i move the system clock 30% faster, code should be executed faster, so it should be reduced. Could you maybe explain this a bit in depth ?

    @AScha.3 

    thanks,

    quite new to stm32, i am actually using LL_SPI_TransmitData8 that should be the "direct register access" already ?

    Super User
    February 13, 2024

    Sorry - dont know , i never tried LL since long time , when STM switching to HAL and LL ...

    But yes, the macro should do it - but if i write to register , i know it happens at max speed.

    example for write -> spi

     

    SPI2->DR = cmd; //HAL_SPI_Transmit(&ILI9341_SPI_PORT, &cmd, sizeof(cmd), HAL_MAX_DELAY);

     

    HAL call needs about 900ns , direct write about 14ns . (But no error checking etc , what HAL always doing.)

    I use the HAL (hoping:  no errors there) because you get , what you want without much fiddling around.

    And if something should be faster, i play the game to write direct. 

    See : most times its not important, when you switch on something, needs 10ns or 10us - you anyway (as a human) many times slower, to see it. Just in some cases, here when drawing the background grid, you really see 300ms drawing the lines; so here is a point, speeding up is useful . Then same drawing in 6ms - you cannot see , it looks like  "instant" grid there. But this is just like a crossword puzzle for me - just for fun.

    You can use just the HAL and use the peripherals with their intended purpose , so set the DMA -> SPI to transfer a block of data at maximum speed. No need for crossword puzzle - if i am at work, i am not for fun there. 

    Graduate
    February 14, 2024

    If it was me, I would be writing some dedicated test software to confirm the behaviour. Writing direct to registers to ensure that no other software is causing the effect. 

    heisenbugAuthor
    Explorer II
    February 14, 2024

    Hi,

    still thanks all

    I did a brief test, writing my data blocks by

    while (l--) {
     *(volatile uint8_t *)(0x4000380c) = *p++;
    }

    so writing on data register excluding any other os code.

    Result is always the same

    heisenbug_0-1707907028521.png

    The gap between frames is of course very visible since i use now 20Mhz spi bus clock.

    So seems i cannot improve this behavior in any way as of now.

     

     

    Graduate
    February 14, 2024

    Shouldn't your simple test include checking the status register (0x40003808 ?) and only sending new data when the TXE (Transmit Buffer Empty) bit is set?

    Super User
    February 14, 2024

    Right !

    I check (on my cpu) "fifo not full" ...

    +

    to get optimum speed : need compile with optimizer -O2 .

     

    heisenbugAuthor
    Explorer II
    February 14, 2024

    Hi greg,

    thanks,

    test changed as

    		while (l--) {
    			if (*(volatile uint8_t *)(0x40003808) & (1 << 1))
    				*(volatile uint8_t *)(0x4000380c) = *p++;
    		}

    no improvements, always this 4us gap, that looks recurrent in the google search issues:

    heisenbug_0-1707917629969.png

    This seems really a limitation of the stm32 hw controller.

    As you see, i am working on a specific spi protocol variant that requires CS to stay asserted over the full transfer. Anyway, to talk to this slave chip i still have available standard SPI option, so can try DMA in standard SPI mode.

    Thanks all for now, but every helpful comment still welcome.

    Graduate II
    February 14, 2024

    Interesting. I am pretty sure that it should work much faster even without DMA (but using DMA is the best way).
    a) Check yours compiler optimization level
    b) Check you core clock (it is possible that SPI runs at 120MHz but core can be "underclocked" to slower speed)
    c) Does "p" pointing to internal RAM or to some external ?
    d) use 32bit access to SPI (write four bytes in one instruction/transaction)

    Warning: you should not only check flags, but also react to their values. Simply write new data only if there is free space in buffer.

    Super User
    February 14, 2024

    Which STM32?

    Disable interrupts before that snippet of code. Disable all DMAs (simply disable their clocks in RCC). Try to transmit a *constant* in an *infinite loop* (i.e avoid reading any variables from memory), plus what @gregstm wrote above.

    Read out and check/post SPI registers content.

    JW

    heisenbugAuthor
    Explorer II
    February 15, 2024

    Thanks all for the great help.

    @waclawek.jan disabling interrupts brought me from 4us gap to 1.8us, Thanks a lot for this. SPI DMA is actually disabled.

    @TDK writing a 32bit constant does not help, i still am near 1.8us gap.

     @Michal Dudka  i am actually forced to stay in Zephyr with default compilation options, that is -Os. (have to check if i can optimize differently with some board-related setting). I set 80->120Mhz clock, no improvement, maybe a bit worst,  gap is moved from 1.8us to 2.0us (test with removed p pointer, tested writing the 32bit constant). 

     

    Continuing in investigations, thanks

    Graduate II
    February 15, 2024

    @heisenbug wrote:

    disabling interrupts brought me from 4us gap to 1.8us, Thanks a lot for this. SPI DMA is actually disabled



    That looks like a good clue. "Empty" interrupt routine can take about 1 or 2 us. I guess so you have enabled some SPI related interrupt (even when you don't use it) which fires every time you load new data to SPI ... may by from RX etc. Check that.

    Super User
    February 15, 2024

    I meant, disable *all* DMA. And I also had other suggestions there, re-read my previous post and follow *all* of them.

    Or, better - start a new project, with nothing else, but setting up clocks, SPI pins and SPI itself, and try transmitting there.

    You don't execute from QSPI, do you?

    JW