CubeMX Generated ThreadSafe/stm32_lock.h - Potential Bug
I've burned a significant amount of time tracking down an issue, so I thought I would share in the hope of helping others and maybe improving ST's provided newlib locks.
(This is probably in the wrong place, please move it as required.)
I'm using the ThreadSafe files (stm32_lock.h & newlib_lock_glue.c) generated by CubeMX 6.6.1 to make newlib thread safe.
I'm utilising Strategy 4: "Allow lock usage from interrupts. Implemented using FreeRTOS locks." (because I'm using FreeRTOS but also the ST USB device library which by default calls malloc from an ISR).
Strategy 4 implements newlib's __retarget_lock_etc suite of hooks to lock critical sections inside newlib.
To test how robust this solution is I created 3 (exact number not critical, but I had 3 LEDs for debugging...) threads which all repeatedly malloc() and free() in infinite loops.
To my dismay it would not be long before a crash/instability occurred.
Often the particular "canary" to be killed would be STM32_LOCK_ASSERT_VALID_NESTING_LEVEL inside stm32_lock_acquire() or stm32_lock_release().
I tracked the issue to corrupted/illogical values for the contents of the lock (LockingData_t) in particular the nesting level itself.
/**
* @brief Acquire STM32 lock
* @param lock The lock to acquire
*/
static inline void stm32_lock_acquire(LockingData_t *lock)
{
STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
STM32_LOCK_ASSERT_VALID_NESTING_LEVEL(lock);
lock->basepri[lock->nesting_level++] = taskENTER_CRITICAL_FROM_ISR();
}It turns out (upon inspecting the assembly) that GCC decided to carry out the lock->nesting_level++ increment BEFORE disabling interrupts (setting the basepri interrupt mask in my case).
Therefore it was possible for multiple contending threads to each increment lock->nesting_level in their call to stm32_lock_acquire() before the leading thread would eventually turn off interrupts, by which point the nesting_level value would be higher than it logically should be. Shenanigans soon follow.
The incrementing of nesting_level should be inside the critical region, but in my case GCC had other ideas.
The fix for me was simple:
/**
* @brief Acquire STM32 lock
* @param lock The lock to acquire
*/
static inline void stm32_lock_acquire(LockingData_t *lock)
{
STM32_LOCK_BLOCK_IF_NULL_ARGUMENT(lock);
STM32_LOCK_ASSERT_VALID_NESTING_LEVEL(lock);
lock->basepri[lock->nesting_level] = taskENTER_CRITICAL_FROM_ISR();
lock->nesting_level++;
}This convinced the compiler to increment inside the critical region, with interrupts safely off.
I would recommend ST make this change to their released code.
