5 Hidden STM32 Features That Even Experienced Developers Miss
STM32 MCUs hide surprisingly useful hardware features that can save you CPU cycles, lower power, make multi-core life easier, and fix intermittent bugs — if you know they exist. This article reveals 5 “hidden” STM32 features (with concrete examples and code) that even experienced MCU engineers often forget, and exactly how to use them safely in production.
Feature 1 — Fast-mode Plus I²C (FM+, up to 1 MHz)
Fast-mode Plus (FM+) is an I²C mode that supports communication speeds up to 1 MHz, which is faster than Standard mode (100 kHz) and 2× faster than normal Fast-mode (400 kHz).
Why does FM+ exist?
Standard 400 kHz I²C couldn’t keep up with:
-
High-speed sensors
-
High-refresh displays
-
Real-time control loops
-
Large configuration registers in ICs
-
Interrupt-driven or polling-heavy systems
FM+ allows faster transfers without switching to SPI.
Why is FM+ “special” in STM32?
Most STM32 MCUs have:
-
Dedicated FM+ pins (because they need stronger current capability)
-
Hardware filters designed to support 1 MHz edges
-
Automatic glitch filtering updated for high-speed modes
Example:
PA11/PA12 on STM32F4 or PB6/PB7 on STM32G4 often support FM+.
Quick code hint (conceptual):
// Pseudocode: enable FM+ in SYSCFG (family-dependent)
SYSCFG->CFGR1 |= SYSCFG_CFGR1_I2C_FMP; // family-specific bit; check datasheet
// Then configure I2C timing for ~1MHz using CubeMX timing calculator or AN4235
Feature 2 — Hardware Semaphores (HSEM) — great for multi-core STM32
Hardware Semaphores are special hardware registers that act like locks.
Two cores (e.g., Cortex-M7 and Cortex-M4 in STM32H7) can use these locks to coordinate access to:
-
Shared peripherals
-
Shared RAM regions
-
Shared communication buffers
-
Shared interrupt events
-
Boot coordination
This prevents race conditions and makes multi-core programming reliable.
Why do we need Hardware Semaphores?
In dual-core STM32 MCUs:
-
Both cores can read/write memory
-
Both cores can access I²C, SPI, GPIO
-
Both cores can modify shared buffers
Without protection, both cores may:
-
Write to the same memory at the same time
-
Configure a peripheral at the same time
-
Interrupt each other unexpectedly
-
Cause corrupted data, freezes, or hard faults
A hardware semaphore ensures that only ONE core owns a resource at a time.
Example Pattern:
// take: write 0x1 to HSEM_SEMIDx to lock if free
if (HSEM->RLR[semid] == 0) {
HSEM->R[semid] = LOCK_VALUE; // 2-step or 1-step lock depends on family
}
Feature 3 — Burst DMA
Burst DMA means the DMA controller transfers multiple data items in one go (a burst) instead of transferring them one-by-one.
Why bursts are faster?
Because each DMA transfer requires:
-
an address fetch
-
a bus request
-
a bus arbitration
-
a memory access
A burst packs many of these into one bus transaction.
Typical burst sizes:
-
Single (1 beat)
-
INCR4 (4 beats)
-
INCR8 (8 beats)
-
INCR16 (16 beats)
Benefits:
-
Higher memory bandwidth
-
Lower bus overhead
-
Perfect for high-speed peripherals
-
Reduces CPU contention on the AHB/AXI bus
When to use Burst DMA?
-
Large data arrays
-
Audio buffers
-
Image frames
-
ADC multi-channel sequences
-
SPI/I2S streaming
-
Memory-to-memory copies
Feature 4 — DMA FIFO Mode
FIFO = First-In-First-Out buffer inside the DMA (up to 4 words depth, depending on MCU).
The FIFO acts like a mini-cache, allowing the DMA to:
-
accumulate data before writing
-
pack bursts more efficiently
-
avoid misaligned transfers
-
avoid bus stalls
-
adapt to different memory widths
Modes:
-
Direct mode (FIFO disabled)
-
FIFO mode enabled
-
FIFO threshold: ¼, ½, ¾, full
-
Why FIFO is useful?
Because without it, the DMA must transfer each element immediately — inefficient for:
-
different source/destination widths (e.g., 8→32 bit)
-
slow peripheral buses
-
high burst rates
FIFO example:
ADC outputs 16-bit.
Memory needs 32-bit aligned writes.
Without FIFO → problems.
With FIFO → DMA packs data correctly and efficiently.
Feature 5 — Double-Buffer Mode (Ping-Pong Mode)
Double-buffer mode allows DMA to use two memory buffers:
-
Buffer A
-
Buffer B
While DMA is filling one buffer, the CPU can process the other.
This solves a major real-time problem:
no data loss during processing.
Perfect for:
-
Audio or voice streaming
-
Continuous ADC sampling
-
UART RX with high throughput
-
Real-time DSP
-
High-speed USB
-
Camera or video frame capture
Example:
-
DMA fills Buffer A
-
CPU processes Buffer B
-
DMA finishes → buffer swap
-
CPU now processes A
-
DMA now fills B
This is called ping-pong or circular double-buffering.
How these 3 features work together
The most powerful DMA setup is:
Burst + FIFO + Double-buffer
Example use cases:
-
High-speed ADC sampling → DSP pipeline
-
Ethernet RX/TX descriptors
-
I2S audio codec
-
SPI display streaming
-
Camera interface (DCMI)
Data moves:
-
Double-buffer mode ensures no pauses or data loss
-
FIFO mode ensures bus efficiency and alignment
-
Burst mode ensures maximum throughput
