Skip to main content
LCE
Principal II
July 23, 2024
Question

STM32 with small flash + HAL + DEBUG + no Optimization...

  • July 23, 2024
  • 16 replies
  • 5662 views

... leads to "funny" results:

STM32L011 with 16 kB of flash, HAL inits and LED toggle in main, DEBUG, no optimization:

93% of flash used

I had not expected it to be that bad! :grinning_face_with_sweat:

Okay, I will probably not use too much HAL, will use Release version with optimization,
but I better select an L01 in the next bigger QFN package (4x4) to get 32 kB of flash.

STM32L011_HAL_DEBUG_noOpt.png

16 replies

LCE
LCEAuthor
Principal II
August 12, 2024

Atmel's AVRs were my main MCU over the last 20 years or so... surely never used something like HAL or a library. :D

Having recently worked so much with the H7, I thought I'd try a small STM32, because for that project I would have taken a (for us) new AVR, so I could go STM32.

Anyway, I chose the L03 with 32 kB flash. And I love the Nucleos...

Firmware's almost done, I only use LL stuff for clock setup and ADC, otherwise direct register, no float for printf.

Biggest chunk in flash is now my standard debug UART interface with lots of const char strings for printf, but I'm far from the 32 kB limit, using size optimization though.

Tesla DeLorean
Guru
August 12, 2024

I've tried to migrate us from old SiLabs 8051's where possible, I need something for basic initialization / customization. Typically needs to be physically small, but workable, so BGA and WSP generally not favoured by the crew in PCBA, or the more costly PCB

The 16KB L011 worked well until feature creep set it, people wanted configuration and diagnostics for production, bit too tight, 32KB would have more headroom as all the libraries took their bite from the first 16KB

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
BarryWhit
Lead
August 12, 2024

Biggest chunk in flash is now my standard debug UART interface with lots of const char strings for printf, but I'm far from the 32 kB limit, using size optimization though.

 

I was sure something like this existed, and I was not disappointed:

https://github.com/atomicobject/heatshrink

purported foorprint of decoder is about 1K flash + 100bytes RAM (min, your can ram usage for decoding speed).

(Update: also found Segger SMASH. I'm sure others exist. It's an obvious idea.)

 

If you have enough low-entropy data, it might provide a net gain. One approach to making the most efficient use of small controllers is to trade what you have in excess to get some more of what is scarce. Can you give away some SRAM to gain some Flash?

 

LCE
LCEAuthor
Principal II
August 13, 2024

Interesting stuff from all of you! :thumbs_up:

Yeah, the size... We have relatively small lot numbers, and in most products we have enough space to avoid WLCSP, BGA, and most often - if the manufacturer allows - even QFN, our "go-to" SMD format for passives is still 0603, nothing smaller if possible.
This recent project with the L0 is very space limited, but our production manager still begged me to stay away from BGA and anything smaller than 0603 - while the sales manager side is asking for more features, you know the game... :grinning_face_with_sweat:

Andrew Neil
Super User
September 25, 2024
A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.
LCE
LCEAuthor
Principal II
September 26, 2024

Interesting that the compiler automatically assumes double floating point calculations, these "__aeabi_dxxx" libs take some more space.

I found that I can suppress this with adding "-fsingle-precision-constant" to gcc in the compiler settings:

LCE_0-1727330112817.png

 

Gave me 4 kB flash back.

Is that the correct way? Feels somehow not so elegant...

Andrew Neil
Super User
September 26, 2024

@LCE wrote:

Interesting that the compiler automatically assumes double floating point calculations,


Indeed:

https://mcuoneclipse.com/2019/03/29/be-aware-floating-point-operations-on-arm-cortex-m4f/

 

A complex system that works is invariably found to have evolved from a simple system that worked.A complex system designed from scratch never works and cannot be patched up to make it work.
Visitor II
July 20, 2025

Utilize "Optimize for Debug" (-Og) to display the outcomes. You should also look at the map file to see if HAL is actually taking up all that space...