Graduate

Question

Clock cycle shift on GPIO output STM32F103

Forum|Forum|1 year ago
May 12, 2024
15 replies
14832 views

Dear Community,

I am porting an old application made on AVR to STM32, and I am facing a strange timing issue.

In a nutshell, the application is reading sector (512 Bytes) from a SDCARD and output the content of the buffer to GPIO with 4us cycle (meaning 3us low, 1 us data signal).

The SDCard read is working fine, and I have written a small assembly code to output GPIO signal with precise MCU cycle counting.

Using DWT on the debugger, it give a very stable and precise counting (288 cycles for a total of 4us).

When using a Logic analyser with 24 MHz freq, I can see shift of signal by 1 or 2 cpu cycles and so delay.

I have tried to use ODR directly and BSRR but with no luck.

Attached :

- Screenshot of the logic analyzer

As you can see I do not have 3us but 3.042 and this is not always the case

Clock configuration

Screenshot 2024-05-12 at 06.32.34.png

Port configuration:

GPIO_InitStruct.Pin = GPIO_PIN_13| READ_PULSE_Pin|READ_CLK_Pin;

GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;

GPIO_InitStruct.Pull = GPIO_NOPULL;

GPIO_InitStruct.Speed=GPIO_SPEED_FREQ_HIGH;

HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);

Assembly code :

.global wait_1us

wait_1us:

.fnstart

push {lr}

nop ;// 1 1

nop ;// 1 2

mov r2,#20 ;// 1 3

wait_1us_1:

subs r2,r2,#1 ;// 1 1

bne wait_1us_1 ;// 1 2

pop {lr}

bx lr // return from function call

.fnend

.global wait_3us

wait_3us:

.fnstart

push {lr}

nop

wait_3us_1:

subs r2,r2,#1

bne wait_3us_1

pop {lr}

bx lr // return from function call

.fnend

sendByte:

and r5,r3,0x80000000;// 1 1

lsl r3,r3,#1 ;// 1 2 // right shift r3 by 1

subs r4,r4,#1 ;// 1 3 //; dec r4 bit counter

//mov r6,#0 // Reset the DWT Cycle counter for debug cycle counting

//ldr r6,=DWTCYCNT

//mov r2,#0

//str r2,[r6] // end

bne sendBit ;// 1 4

beq process ;// 1 5

// Clk 15, Readpulse 14, Enable 13

sendBit:

ldr r6,=PIN_BSRR ;// 2 2

LDR r2, [r6] ;// 3 5

cmp r5,#0 ;// 1 6

ITE EQ ;// 1 7

ORREQ r2,r2, #0x80000000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)

ORRNE r2,r2, #0x00008000 ;// 1 9 set bit 29 to 1, OR with 0010 0000 0000 0000

ORR r2,r2, #0x00004000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)

STR r2, [r6] ;// 1 10 set the GPIO port -> from this point we need 1us, 72 CPU cycles (to be confirmed)

bl wait_1us ;// 65 75 144 209

ORR r2,r2, #0xC0000000 ;// 1 12 ; // Bring the pin down

STR r2,[r6] ;// 1 13 ; //

; // We need to adjust the duration of the 3us function if it is the first bit (coming from process less 10 cycle)

cmp r4,#1

ite eq

moveq r2,#56

movne r2,#62

bl wait_3us ; // wait for 3 us in total

b sendByte

I do not know where to look at to be honnest

This topic has been closed for replies.

Show previous replies

M

Michal Dudka

Graduate II

Use logic analyzer or oscilloscope with much higher sampling frequency then your MCU frequency or you will not be able to distinguish measurment relics from real MCU output jitter. Imagine that you generating perfect square wave with frequency 72MHz/20=7.2MHz (period 277.8ns). If you sample this signal by analyzer with 24MHz (period 41.67ns) the you will see jittering output of 2 consecutive periods in duration 6*41.67 (250ns) and one period in duration 7*41.67 (292ns) - on perfect square wave ! And that can leads you to wrong conclusion that jitter comes from MCU output ...

V

vbessonAuthor

Graduate

Hello Michal,

Thanks and I genuinely agree with you. However it is not easy to find an AL with capability above 24 MHz.

what I do is to look at the data stream over the whole transmission period and I should not get significant delay. Unfortunately I have delay and not a 250kHz fdata freq.

Vincent

M

Michal Dudka

Graduate II

Ordinary oscilloscope should be able to handle it with ease. As others have already written - use SPI or USART or Timer + DMA and you will get easy seamless pulse stream and with minimal CPU load.

V

vbessonAuthor

Graduate

I will do some test and get back to this thread.

Quick question, If I use DMA buffering with USART, I cannot have stop bit each Bytes, is there a way to remove the stop bit ? It means as well that I will set the clock speed to 1 us (72 cycles), and I will rearrange data stream to have 0.0.0.DATABITS.

U

Uwe Bonnes

Graduate II

How is data sampled on the receiving side? At what edge? I do not see any sensible setup/hold time guard!

V

vbessonAuthor

Graduate

I need to bit stream data along 1 wire interface with1us Data pulse (High or Low) , 3us data line Low, no start bit, no stop bit, what is best SPI, USART seems to have start and stop bits

U

unsigned_char_array

Graduate II

@vbesson wrote:
I need to bit stream data along 1 wire interface with1us Data pulse (High or Low) , 3us data line Low, no start bit, no stop bit, what is best SPI, USART seems to have start and stop bits

1-wire doesn't have critical timing at all. A 1 has a low pulse of 1-15us and a 0 is a low pulse of 60us. Nanosecond dither is irrelevant.

V

vbessonAuthor

Graduate

Hello All,

Quick update on my test based on the feedback you gave me.

What I have tested:

Double buffer DMA with USART
Double buffer DMA with SPI
Double buffer DMA with GPIO & BSR
Disable all IRQ with bit bang SPI and ASM
Reducing the clock speed to avoid congestion on the bus

and combination of all the above.

My feedback:

USART was a great approach to reduce the buffer size, indeed I needed a 2048 buffer (512 Bytes * 4 clock cycles).

The issue with USART is the pause between bytes even without stop bits, USART is waiting a few cycle between Bytes. so I can not use this approach as I need a continuous stream of bit every 3us for 1us.

SPI same a USART, giving the same results. the good thing is using DMA I see more accuracy on the stream of data.

Disabling all IRQ and doing bit bang on SPI with GPIO output using ASM: disabling IRQ does not change anything the accuracy is not there. This is the most frustrating stuff, having an ASM function not behaving the same in one CPU cycle to cycle... there must be a way to do it. Maybe ST can help and provide a more detailed explanation.

Reducing clock speed: it has no effect on accuracy, and btw I need cpu speed to be able to manage SPI without running after the SDCard as I am not doing bitbang on SPI and on GPIO.

Double buffer DMA with GPIO & BSR. This is for the moment the best approach, even from a memory perspective it is pretty ugly. Indeed, for the record I have a buffer of 402 Bytes to be send on a 4us cycle (3us delay, 1us data cycle). It means 13 Chunk of 32 Bytes, so I needed a unint32 (BSR is UINT32) buffer of 2048 = 8192 bytes (64 Bytes x 8 Bit x 4 timer cycles, x 4 UINT size #&##é!). What I could do and not done yet, is to use straight ODR to have a UINT16 and dividing the buffer in 2. I need to test this as my prog is not over and I would need more memory space to manage the OLED screen.

NB: I scratched my head on the DMA interrupt not triggering. I used

HAL_DMA_Start_(&hdma_tim2_up, (uint32_t)DMA_BUFFER, (uint32_t)&(GPIOC->BSRR), 2048);

//instead of

HAL_DMA_Start_IT(&hdma_tim2_up, (uint32_t)DMA_BUFFER, (uint32_t)&(GPIOC->BSRR), 2048);

This is the way I prepare the buffer :

void initeDMABuffer(char * buffer){ // TODO check number of CPU cycle in C and Assembly
 
 uint32_t GPIO_14L_15L= 0xC0000000; // No Data Pulse, No Clock
 uint32_t GPIO_14H_15H= 0x0000C000; // Data HIGH, Clock HIGH
 uint32_t GPIO_14H_15L= 0x80004000; // Data LOW, Clock HIGH
 
 char c=0;
 int l=0;

 for (int j=0;j<DMA_BUFFER_SIZE;j++){ // Populate 128 Bytes, 8 bits each, and 4 x 1us step,
 c=buffer[j]; // DMA to GPIO will be based on a 1us frequency, so 72 clock cycle on a STM32F103,
 for (int k=0;k<8;k++){
 // upfront compute for optimization,
 DMA_BUFFER[l]=GPIO_14L_15L; // Cycle 1 wait, 
 DMA_BUFFER[l+1]=GPIO_14L_15L; // Cycle 2 wait,
 DMA_BUFFER[l+2]=GPIO_14L_15L; // Cycle 3 wait, 
 
 if (c & 0x80) // AND x80, test if Bit 15 is 1, 0x1000 0000 0000 
 DMA_BUFFER[l+3]=GPIO_14H_15H; // Only populate the 4th value to do , 1us wait cycle, 1us wait cylce, 1us wait cycle, 1 us data cycle
 else
 DMA_BUFFER[l+3]=GPIO_14H_15L; // Assuming Bit 15 is 0, then GPIO 15 Low
 c=c<<1;
 l+=4; // Left shift by 1 next iteration
 }
 
 }

}

This the Half buffer preparation during DMA Cycle

void populateHalfDMABuffer(char * buffer,int pos,int half){

// GPIO13 -> Chip enable (active low)
// GPIO14 -> Clock pulse
// GPIO15 -> Data pulse

// buffer correspond to the Sector char buffer,
// pos is the current position in the buffer %64
// Half is the first 0 or second half of the DMA array

uint32_t GPIO_14H_15H= 0x0000C000;
uint32_t GPIO_14H_15L= 0x80004000;

char c=0;
unsigned int l=half*1024;
unsigned int bsize=DMA_BUFFER_SIZE/2;
for (int i=0;i<bsize;i++){
 c=buffer[pos+i];
 for (int j=0;j<8;j++){

 if (c & 0x80)
 DMA_BUFFER[l+3]=GPIO_14H_15H;
 else
 DMA_BUFFER[l+3]=GPIO_14H_15L;
 c=c<<1;
 l+=4;
 }
}

return;
}

These are my 2 DMA Buffer callback functions:

volatile int ClusterSlice;
void HAL_DMA_HalfTxIntCallback(DMA_HandleTypeDef *hdma)
{
	 if (ClusterSlice<13){
 ClusterSlice++;
 // Half the buffer has been transmitted;
 populateHalfDMABuffer(sectorBuf,ClusterSlice*DMA_BUFFER_SIZE/2,0);
 }else{
 
 __disable_irq(); 
 HAL_TIM_Base_Stop_DMA(&htim2);
 __enable_irq(); 
 prepareNewSector=1;
 
 }
}

void HAL_DMA_FullTxIntCallback(DMA_HandleTypeDef *hdma){
	 
 
 if (ClusterSlice<13){
 ClusterSlice++;
 populateHalfDMABuffer(sectorBuf,ClusterSlice*DMA_BUFFER_SIZE/2,1);
 }
 
 else{
 
 __disable_irq(); 
 /* might not be necessary */
 //hdma_tim2_up.XferCpltCallback=NULL;
 HAL_TIM_Base_Stop_DMA(&htim2);
 __enable_irq(); 

 prepareNewSector=1;
 //printf("End of DMA\n");
 }
 
 // end if the the initial buffer
 
}

In the end this is the output on the logic Analyser:

Full 402 Data Chunk Screenshot 2024-05-22 at 06.23.38.png

What is left todo:

- Manage disk head movement based on GPIO interrupt (and then move to the right SD card Sector and cluster)

- Doing some testing to see if the timing is accurate enough.

I will keep you posted on how I progress.

Vincent

V

vbessonAuthor

Graduate

Hello All,

the delay between 2 data chunk of (512 Bytes) is causing some trouble.

I am heading to using DMA SPI to send bytes.

I am having some issues with DMA interrupt and I need a small help.

I want to have Half transfer DMA and Complete transfer DMA interrupt.

I did

 hdma_spi1_tx.XferHalfCpltCallback=HAL_DMA_HalfSpiTxIntCallback;
 hdma_spi1_tx.XferCpltCallback=HAL_DMA_FullSpiTxIntCallback;
 
 HAL_SPI_Transmit_DMA(&hspi1,DMA_BIT_BUFFER,1608);

The interrupt never get called...

Should I use instead ?

 hdma_spi1_tx.XferHalfCpltCallback=HAL_DMA_HalfSpiTxIntCallback;
 hdma_spi1_tx.XferCpltCallback=HAL_DMA_FullSpiTxIntCallback;
 
 //HAL_SPI_Transmit_DMA(&hspi1,DMA_BIT_BUFFER,1608); // 402*8*4
 HAL_SPI_Transmit_IT(&hdma_spi1_tx,DMA_BIT_BUFFER,1608);

In that case I assume I have to setup the timer to hdma_spi1_tx ?

Thanks for your help

Vincent

V

vbessonAuthor

Graduate

Ok I found it :

It needs to have

void HAL_SPI_TxCpltCallback(SPI_HandleTypeDef *hspi)
{
 printf("debug full\n");
}


void HAL_SPI_TxHalfCpltCallback(SPI_HandleTypeDef *hspi){
 printf("debug half\n");
}

along with

 HAL_SPI_Transmit_DMA(&hspi1,DMA_BIT_BUFFER,1608); // 402*8*4

By the way the DMA_SPI_TX seems to be the best approach to save RAM and very precise & accurate

Vincent

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded