Skip to main content
Graduate II
July 23, 2024
Question

STM32 with small flash + HAL + DEBUG + no Optimization...

  • July 23, 2024
  • 16 replies
  • 5661 views

... leads to "funny" results:

STM32L011 with 16 kB of flash, HAL inits and LED toggle in main, DEBUG, no optimization:

93% of flash used

I had not expected it to be that bad! :grinning_face_with_sweat:

Okay, I will probably not use too much HAL, will use Release version with optimization,
but I better select an L01 in the next bigger QFN package (4x4) to get 32 kB of flash.

STM32L011_HAL_DEBUG_noOpt.png

    This topic has been closed for replies.

    16 replies

    Graduate II
    July 23, 2024

    Your screenshot show many peripherals , then maybe your LED toggle count bitcoins too ...

     

    Super User
    July 23, 2024

    Show the results with "Optimise for Debug" (-Og).

    You should also take a look at the map file - is it actually HAL that's using all that space ... ?

    :thinking_face:

    Super User
    July 23, 2024

    The Memory Details tab gives you more insights.

    hth

    KnarfB

    LCEAuthor
    Graduate II
    July 24, 2024

    Yes, luckily I know all that - it was just a bad surprise and might deter some complete STM32 beginners.

    So, this wasn't actually a call for help, just ... being "amazed" at how things are. Until now I only worked with "bigger" STM32 (G4, F7, H7), and occasionally using HAL or at project start wasn't a problem.

     

    The list file showed mostly the "bloated" HAL inits - I actually forgot about the "Memory Details", thanks @KnarfB .

    With release / optimization to size, it drops to 56%.

    Record holder is HAL_RCC_Config with 1.15 kB, together with HAL_RCC_ClockConfig and HAL_RCCEx_PeriphCLKConfig taking up even more than 10% of the 16 kB.
    These are the HAL setup functions I always use - I guess I have to overcome my laziness in that area.

     

    image.png

     

    BTW - RAM: the default CubeMx settings for heap and stack also took up ~90% :D

     

    Graduate II
    July 24, 2024

    In MX you can very easy switch some parts to LL

    LCEAuthor
    Graduate II
    July 25, 2024

    If this MCU will be used in a product, I will probably throw out the HAL stuff anyway (I don't like the LL either, so direct register settings for me).

    I was just really shocked that a basically empty project - except for the inits - with all defaults from CubeMX leads to this memory usage: > 90% for flash and RAM.

    Super User
    July 25, 2024

    @LCE wrote:

    I was just really shocked that a basically empty project - except for the inits.


    Probably, the inits are where 90% of the code lies!

    So what do you get if you don't init any peripherals?

    And then there's the C runtime support - how much did that amount to?

    LCEAuthor
    Graduate II
    July 25, 2024

    @Andrew Neil 

    Probably, the inits are where 90% of the code lies!

    Definitely!

    I just switched to its next big brother with 32 kB SRAM, switched all to LL instead of HAL.

    Release with no Opt: 9.57 kB flash

    Release with Opt for size: 4.14 kB flash  <- I hadn't expected that with LL

     

    And then there's the C runtime support - how much did that amount to?

    Erm... where's that? Isn't that for memory allocation and stuff like that - which I don't use?

    Super User
    July 25, 2024

    @LCE wrote:

    @Andrew Neil 

    Erm... where's that? Isn't that for memory allocation and stuff like that - which I don't use?


    There's a lot more to it than that!

     

    LCEAuthor
    Graduate II
    July 25, 2024

    Okay, I'll check that.

    I'm far from being a software guy, still have more experience with 8-bitters and FPGAs than with STM32...

    Super User
    August 12, 2024

    > still have more experience with 8-bitters

    And do you use modern development environments together with clicking configurators and abstraction libraries with the 8-bitters?

    While I don't have first-hand experience with these, I'm quite confident these days you can find some of these being capable of filling up the FLASH of a low-end target mcu quite safely, too.

    It's 21st century, after all.

    JW

    Super User
    August 12, 2024

    @waclawek.jan wrote:

    While I don't have first-hand experience with these, I'm quite confident these days you can find some of these being capable of filling up the FLASH of a low-end target mcu quite safely, too.


    You certainly can - just take a look at  Atmel's  Microchip's ASF stuff for the AVRs ...

    And, again, there's the C runtime - just a printf() is a great way to fill a small micro's Flash.

    Or some floating-point ...

    Graduate II
    August 12, 2024

    Also note that CubeIDE by default compiles/links with --gc-sections as an "optimization". I bet without that flag being on by default, the skeleton binary wouldn't even fit in flash...

     

    That said, engineers usually pick these tiny parts either  for high-volume cost-sensitive applications, or because they to want to use the smallest package possible. I think it's a reasonable compromise to do development and debugging on a beefier partnum from the same family, and then shoehorn the release version into the cheapest part number that will do, for actual production. There's some risk involved in this approach, but it makes sense if you're working under certain kinds of constraints . 

    Graduate II
    August 12, 2024

    Frequently SMALL is what I want, the 3x3 QFN20 is perfect, ST's not had a good game here, TTSOP20 is just too massive, and honestly I don't have the space-time to prototype with larger parts, nor have to deal with higher costs/complexity of PCB and BGA with pin-in-pad issues (fill and planarization). People demand boards the size of a postage stamp, not a post card.

    MAXIM had some very small CM4F with relatively large FLASH, 96MHz, 256KB/96KB, 4x4mm, overkill for my application for sure, but demonstrative of what the right process/geometry can deliver, beyond CM0(+). This is how you exorcise 8-bit MCU, not by barely replacing them, or forcing a redesign into 3V3. When you can pivot into a order of magnitude more effectiveness the choices/decisions get a lot simpler for the guys with bigger hats.

    Graduate II
    August 12, 2024

    There's constant pressure from china driving cost down and capabilities up. JLCPCB recently made via-in-pad free for 6 layers and up, and have had a 6 layer deal going for months now that makes them cheaper than 4 layers. There's still the complexity of layout to consider, but if someone really needs to get small then switching from pins to balls ultimately makes sense. 

     

    I've been meaning to try hand-assembling some WLCSP chips (they are marvels of miniature). But I'd probably have to go through caffeine withdrawal first, before attempting it - so maybe not.

     

    Whenever I see a WLCSP, I can't help but think (by way of contrast) of those old spy movies featuring "cutting-edge" east-german phone bugs - the size of a motorola brick phone.

    Graduate II
    August 12, 2024

    The 16KB part is pretty tight, especially with the float library pulled in.

    For the L011 there's likely a drop in C0 or G0 that can maintain a small package with larger memory.

    STM32C011F6U6TR
    IC MCU 32BIT 32KB FLASH 20UFQFPN (3x3 mm)