Skip to main content
Visitor II
December 30, 2019
Question

There is a bug in TIM's work on stm32h750, but on stm32f407 everything works fine. Please check the results on your controller.

  • December 30, 2019
  • 7 replies
  • 1532 views

#include "stm32h7xx_hal.h"

extern void SystemClock_Config_16MHz(void);

uint32_t DBG32[10];

//======================

void delay(uint32_t wait){

 while(wait--);

}

//==================

void Test_TIM(){

  

  

 HAL_Init(); 

  

 SystemClock_Config_16MHz(); // my quarc = 16MHz => pll1=400MHz

  

 TIM_HandleTypeDef  TimHandle;

  

 __HAL_RCC_TIM2_CLK_ENABLE();

  

 TimHandle.Instance      = TIM2;

 TimHandle.Init.Period    = (uint32_t)-1; 

 TimHandle.Init.Prescaler   =0;// == 200MHz;

 TimHandle.Init.ClockDivision = 0;

 TimHandle.Init.CounterMode  = TIM_COUNTERMODE_UP;

 TIM_Base_SetConfig(TimHandle.Instance, &TimHandle.Init);

    

 __HAL_TIM_ENABLE(&TimHandle);

  

 // -------- 

 __disable_irq();

 TIM2->CNT =0;

 DBG32[0]=TIM2->CNT; 

 delay(1);

 DBG32[1]=TIM2->CNT; 

 delay(1);

 DBG32[2]=TIM2->CNT; 

 delay(1);

 DBG32[3]=TIM2->CNT;

 delay(1);  

 DBG32[4]=TIM2->CNT;

  

 delay(3);

 DBG32[5]=TIM2->CNT; 

 delay(3);

 DBG32[6]=TIM2->CNT; 

 delay(3);

 DBG32[7]=TIM2->CNT;

 delay(3);  

 DBG32[8]=TIM2->CNT;

  

 while(1);

  

 // DBG32[0..8]=2,42,94,134,172,236,284,310,342

 // delta= DBG32[i]-DBG32[i-1]== 40,52,40,38, 64,48,26,32 !! nonsense !!

 // in stm32f407 == all OK, but in stm32h750 == bug !!!

    This topic has been closed for replies.

    7 replies

    Graduate II
    December 30, 2019

    Not like the superscaler CPU is running time backward.

    Make the loop iterator volatile so compiler doesn't fold/remove the loop.​

    ignatyyAuthor
    Visitor II
    December 30, 2019

    Thanks for the answer. The assembler listing shows that the compiler does not delete anything and step-by-step debugging also points to this. The reason is something else.

    Graduate II
    December 30, 2019

    >>The reason is something else.

    Like superscalar, cache-line width, branch prediction?

    What's your issue here? That your software delay doesn't produce consistent numbers, based on where and how the function is called?

    Running out of FLASH or ITCM RAM?

    GNU tools?

    The CM4F and CM7 are decidedly different architectures.

    Super User
    December 30, 2019

    High execution speed in the Cortex-M7 is achieved by a combination of caching, pipelining, branch prediction and - as Clive said above - superscalar execution. Each one of these result in execution time of individual instructions to depend heavily on context, being far, far from being constant - and of course these latencies add up.

    This is not just a simply clocked-up microcontroller.

    JW

    ignatyyAuthor
    Visitor II
    December 30, 2019

    Сaching is disabled.

     DBG32 [5..8] the code is absolutely identical, and the score is 2 times different!

    Super User
    December 30, 2019

    > Сaching is disabled.

    You've excluded one of the many sources of latency/jitter. There are more - FLASH latency, prefetch vs. fetch match, ART if you are executing through it. AXI latencies/arbitration. Access to TIM goes through AXI-to-AHB and AHB-to-APB bridges, there may be different clocks, and there are resynchronisations.

    > DBG32 [5..8] the code is absolutely identical,

    It's not, for example it is running from different addresses. And, as I've told you above, most of the latencies sources are context (i.e. history, mutual relationships etc.) dependent.

    What you see is normal. High processing power comes at the cost of loss of control. Accept it.

    JW

    Visitor II
    December 30, 2019

    Did you turn the optimization all the way off?

    I wonder what the purpose of this code is. I wouldn't rely on reading the timer counter register​ on the fly like that and have always tried to avoid using blocking delays. The vast complement of timers and dmas and dmamux and nvic on the h7 is what gives it the real time behavior, not the other way around.

    It's not an 8bit pic or avr.​

    ignatyyAuthor
    Visitor II
    January 3, 2020

    I post part of the assembler listing: to execute the subroutine

    "delay (1)" should be up to 6 CPU cycles (this is the worst case!). Plus 8 CPU cycles for calling and reading _ writing TIM. Total should be no more than 14 CPU cycles. The value in DBG [1] = 46 indicates the real 92 CPU cycles

    #include "stm32h7xx_hal.h"
     
    extern void SystemClock_Config_16MHz(void);
     
    uint32_t DBG32[20];
     
     
    //======================
    void delay(uint32_t wait){
     while(wait--);
    }
     
    //==================
    // iar optimize= none
    //===============
    void Test_TIM(){
     
     //SCB_DisableICache();
     //SCB_DisableDCache();
     
     HAL_Init(); 
     
     SystemClock_Config_16MHz(); // my quarc = 16MHz => pll1=400MHz
     
     TIM_HandleTypeDef TimHandle;
     
     __HAL_RCC_TIM2_CLK_ENABLE();
     
     TimHandle.Instance = TIM2;
     TimHandle.Init.Period = (uint32_t)-1; 
     TimHandle.Init.Prescaler =0;// == 200MHz;
     TimHandle.Init.ClockDivision = 0;
     TimHandle.Init.CounterMode = TIM_COUNTERMODE_UP;
     TIM_Base_SetConfig(TimHandle.Instance, &TimHandle.Init);
     
     __HAL_TIM_ENABLE(&TimHandle);
     
     // -------- 
     __disable_irq();
     TIM2->CNT =0;
     
     // -- delay(1) ----
     DBG32[0]=TIM2->CNT; 
     delay(1);
     DBG32[1]=TIM2->CNT; 
     delay(1);
     DBG32[2]=TIM2->CNT; 
     delay(1);
     DBG32[3]=TIM2->CNT;
     delay(1); 
     DBG32[4]=TIM2->CNT;
     
     // -- delay(10) ----
     delay(10);
     DBG32[5]=TIM2->CNT; 
     delay(10);
     DBG32[6]=TIM2->CNT; 
     delay(10);
     DBG32[7]=TIM2->CNT;
     delay(10); 
     DBG32[8]=TIM2->CNT;
     
     // delta= DBG32[i]-DBG32[i-1]
     for(uint8_t i=1; i<=8; i++) DBG32[i-1]= DBG32[i] - DBG32[i-1]; 
     DBG32[8]= 0;
     
     while(1);
     
     // rezult: delta= DBG32[i]-DBG32[i-1]== 42,46,32,40, 40,40,48,56 !! nonsense !!
     
     
     //========================
     // part of listing asm
     //=========================
     
     // void delay(uint32_t wait){
     // while(wait--);
     /*
    delay: // 5 == CPU cycles!!
    ??delay_0:
     MOVS R1,R0
     SUBS R0,R1,#+1
     CMP R1,#+0
     BNE.N ??delay_0
     BX LR ;; return
     
     //------- 
     // delay(1);
     // DBG32[5]=TIM2->CNT; 
     //-------
     
     // -- 8 == CPU cycles ---
     // delay(1);
     MOVS R0,#+1
     BL delay
     // DBG32[1]=TIM2->CNT; 
     LDR R0,[R5, #+0]
     STR R0,[R4, #+4]
     
     // summa CPU cycles = 6+8= 14 !!!
     // ! ! ! value DBG32[1]=46 === 92 CPU cycles ! ! !
     ??? question: where is the conveyor acceleration ???
     
     */
    }

    . Question: where is the conveyor superscalar?