G4 vs H5 speed
Hi,
We are running a crypto key generation library on both an STM32G474 (170MHz) and an STM32H563 (240MHz).
Over about 80 runs, the average generation time is around 12 seconds for the H5 and 38 seconds for the G4.
These times are fine for our application (although faster would be better) but I am curious why the G4 is over 3 times slower as I would have expected something more like 40% longer generation times.
The arithmetic part of the library uses some assembler functions for long multiplications and, for both processors, these seem to have picked a section for "Armv6-M (or later) with DSP Instruction Set Extensions."
The compiler flags being used are listed here. The repeated sections are because I still haven't really figured out how cmake is generating them but I believe the relevant differences are:
-mcpu= cortex-m4 or cortex-m33
-mfpu= fpv4-sp-d16 or fpv5-sp-d16
compile C with /usr/bin/arm-none-eabi-gcc
__VERSION__ "14.2.1 20241119"
G4 Flags:
C_FLAGS = -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -O3 -DNDEBUG -std=c99 -Wall -Wextra -Wwrite-strings -Wmissing-prototypes -Wformat=2 -Wno-format-nonliteral -Wvla -Wlogical-op -Wshadow -Wformat-signedness -Wformat-overflow=2 -Wformat-truncation -O2 -Wmissing-declarations
H5 Flags:
C_FLAGS = -mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -O3 -DNDEBUG -std=c99 -Wall -Wextra -Wwrite-strings -Wmissing-prototypes -Wformat=2 -Wno-format-nonliteral -Wvla -Wlogical-op -Wshadow -Wformat-signedness -Wformat-overflow=2 -Wformat-truncation -O2 -Wmissing-declarations
Any thoughts would be welcome.
Thanks

