Skip to main content
Explorer
March 11, 2024
Solved

Slow performance of custom external flash loader on STM32H735G-DK.

  • March 11, 2024
  • 8 replies
  • 4408 views

            I made external flash loader for STM32H735G-DK. I’m able to read, write and erase flash. However, programming 1M of flash takes ~ 50 seconds. STM provided loader takes around 7 sec for the same. Octo SPI in both cases, clocks are the same- verified, DTR in both cases- verified. I am testing on the same board, the same STLINK, so hardware is not limiting factor.

Would anyone have an idea what method STM loader uses for such a huge speed improvement?

    This topic has been closed for replies.
    Best answer by AS1956

    Seems to be working well in the loader context. I can not take credit for this. I did see it in one of the examples from “stm32-external-loader-main”. What surprises me a bit is that constant value used in the Delay does not influence speed of any of the processes (erase, program, verify). I tried few radically different values with no noticeable difference. I guess bulk of the time is used by self-timed operations in the memory.

    8 replies

    Graduate II
    March 11, 2024

    Maybe you are using some blocking HAL stuff?

    I would grab a scope and compare some signals.

    Super User
    March 11, 2024

    STM provided loader takes around 7 sec for the same. 

    Does this include erase of 1 MB?

     

    AS1956Author
    Explorer
    March 11, 2024

    14:59:38 : Memory Programming ...
    14:59:38 : Opening and parsing file: testbinary1M.bin
    14:59:38 : File : testbinary1M.bin
    14:59:38 : Size : 1.13 MB
    14:59:38 : Address : 0x90000000
    14:59:38 : Erasing memory corresponding to segment 0:
    14:59:38 : Erasing external memory sectors [0 18]
    14:59:43 : Download in Progress:
    14:59:46 : File download complete
    14:59:46 : Time elapsed during download operation: 00:00:07.732

     

     

    Yes

    Graduate II
    March 11, 2024

    Hard to know..

    ST uses 64KB sector erase, and 1 KB pages.

    If ST's is smaller in RAM there is more for payload data. Perhaps less initialization / reset, repeatedly?

    Some comparative logs at Verbose Level 3 might be informative.

    AS1956Author
    Explorer
    March 11, 2024

    logs included, if you wish to take a look.

    Thanks

    AS1956Author
    Explorer
    March 11, 2024

    too fast

    AS1956Author
    Explorer
    March 11, 2024

    AS per MX25LM51245G data sheet programming page is 256 bytes. Are you saying that somehow ST uses 1KB?

    Graduate II
    March 11, 2024

    Actually is reports taking multiples of 0x1000 / 4KB (16 x 256) per Write() operation, so there are some operational efficiencies there, it decomposes to 256-byte pages internally.

    Graduate II
    March 12, 2024

    You haven't told us yet if you are using HAL functions for your own programming.

    If yes, go through these and check for while() and HAL_Delay().

    And check if you are actually using the flash in octal mode. The speed difference is close to a factor of 8, so maybe you are using it in single bit/IO SPI mode.

    AS1956Author
    Explorer
    March 12, 2024

    Got 1M programming close to ST loader. 1M programming ~9.5 sec vs ~7 sec with ST loader. It is good enough for now. You guys were right it was HAL issue.

    Thank you all that commented.

    Graduate II
    March 12, 2024

    @AS1956  We are some curious folks here, so could you please give us a hint what the actual problem was and how you solved it? ;) 

    Thanks!

    AS1956Author
    Explorer
    March 12, 2024

    Overwrote __weak void HAL_Delay(uint32_t Delay) with:

    void HAL_Delay(uint32_t Delay){

      int i=0;

      for (i=0; i<0x1000; i++);

    }

    Works well.

    Graduate II
    March 12, 2024

    Wow, that's ... interesting.

    HAL_Delay() is used in many, many HAL functions, so you might "break" other HAL stuff with this modification.

     

    Super User
    March 12, 2024

    "Externa loaders" cannot use interrupts.