Skip to main content
Associate II
July 11, 2024
Solved

STM32MP13 DK Bare Metal project performance issue

  • July 11, 2024
  • 2 replies
  • 1402 views

Hi,

 

I'm discovering the STM32MP13 Bare metal project. I'm following example given by the STM32CubeMP13 Package.

Everything works fine except for the fact that I'm disappointed by the performance. Indeed, if my theory is correct, the following code should change the pin status every second.

 

 

 

void testCPUFreq(){
	 GPIO_InitTypeDef GPIO_InitStruct = {0};
	 uint32_t i;

	 __HAL_RCC_GPIOH_CLK_ENABLE();

	 GPIO_InitStruct.Pin = GPIO_PIN_6 ;
	 GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
	 GPIO_InitStruct.Pull = GPIO_NOPULL;
	 GPIO_InitStruct.Speed = GPIO_SPEED_FREQ_VERY_HIGH;
	 HAL_GPIO_Init(GPIOH, &GPIO_InitStruct);

	 while(1){
		 for(i = 0; i < CPU_FREQUENCY; i++){

		 }
		 HAL_GPIO_TogglePin(GPIOH, GPIO_PIN_6);
	 }

}

 

 

 

I made my measurements using a logic analyzer connected to the pin in question.

  • When I run the test on an STM32F7 target running at 200MHz, the pin status changes every second (which is the expected result).
  • When I run this code on the STM32MP13 DK bare metal running at 650MHz using template given by ST, the pin status changes every 8 seconds.

For the STM32MP13, I use the following configuration :

I'm not using DDR and I'm using the MMU_USE and CACHE_USE preprocessor directive.

In addition, I'm using the following defines:

 

 

 

#define PREFETCH_ENABLE 1U
#define INSTRUCTION_CACHE_ENABLE 1U
#define DATA_CACHE_ENABLE 1U

 

 

 

Have I forgotten a configuration for optimum performance?

Is anyone experiencing performance problems with STM32MP13 Bare metal?

 

Thanks for you reply.

 

 

Best answer by PatrickF

Hi @Clement7 

likely that the code in SYSRAM is not part of cacheable area.

note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).

Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).

 

For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)

Regards.

2 replies

PatrickF
PatrickFBest answer
Technical Moderator
July 11, 2024

Hi @Clement7 

likely that the code in SYSRAM is not part of cacheable area.

note that overall performance of such code will be much better on Cortex-M7 inside STM32F7 than on Cortex-A7 inside STM32MP13 (but not with a ratio of x8 !).

Performance also depend on bus clocks frequencies (e.g. AXI and AHB clocks used for SYSRAM and for GPIO), compiler (gcc is slightly lower perf than IAR or Keil/ARM) and level of optimization (uses -o2 or -o3).

 

For reference, Coremark/MHz of Cortex-M7 is above 5 (IAR I guess) while it is around 3.2 (GCC) for Cortex-A7 (Cortex-A is more tailored for complex multi-thread usage than for pure real-time)

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.NEW ! Sidekick STM32 AI agent, see here
Clement7Author
Associate II
July 15, 2024

Hello @PatrickF 

Thank you for your fast reply and your clarifications. 

I don't know why I missed the optimizations. Enabling level 1 with GCC was enough to make up for the x8. Now the loop changes the status pin every 1 second as expected.

Regards.