Skip to main content
Graduate II
February 6, 2020
Question

[BUG] STM32 HAL driver lock mechanism is not interrupt safe

  • February 6, 2020
  • 12 replies
  • 13210 views

STM32 HAL driver library is full of flawed and sub-optimal constructs. The most common one, which impacts almost all drivers, is the lock mechanism. It's a bad and limiting design and getting rid of it requires a major rewrite, but the worst fact is that it's not even interrupt safe and therefore doesn't provide locking for which it was introduced. The current __HAL_LOCK() (reformatted) code looks like this:

#define __HAL_LOCK(__HANDLE__) \
do{ \
	if((__HANDLE__)->Lock == HAL_LOCKED) \
	{ \
		return HAL_BUSY; \
	} \
	else \
	{ \
		(__HANDLE__)->Lock = HAL_LOCKED; \
	} \
}while (0U)

Between testing and setting the ->Lock an interrupt can happen and also test and set the ->Lock. Therefore both - main thread and interrupt - will continue execution as if the object was unlocked and the interrupt will unlock it before the main thread has completed it's "locked" part, which makes it even more prone to next interrupt calls. The same will happen when higher priority interrupt interrupts lower priority interrupt. Additionally the ->Lock variable is not marked as volatile and therefore is prone to reorder by compiler optimization.

The proposed fix is simple and requires adding of only a few lines of code in stm32XXxx_hal_def.h files for all STM32 series:

#define __HAL_LOCK(__HANDLE__) \
do { \
	uint32_t rPriMask = __get_PRIMASK(); \
	__disable_irq(); \
	if ((__HANDLE__)->Lock == HAL_UNLOCKED) { \
		(__HANDLE__)->Lock = HAL_LOCKED; \
		__set_PRIMASK(rPriMask); \
	} else { \
		__set_PRIMASK(rPriMask); \
		return HAL_BUSY; \
	} \
} while (0)
 
typedef volatile enum {
	HAL_UNLOCKED = 0x00U,
	HAL_LOCKED = 0x01U
} HAL_LockTypeDef;

Thus this will make that bad construct at least interrupt safe and actually provide locking as was intended.

Note that __HAL_UNLOCK() code doesn't need modifications as it already is atomic.

    This topic has been closed for replies.

    12 replies

    Visitor II
    February 7, 2020

    I think __HAL_LOCK/__HAL_LOCK would be unnecessary if user calls HAL functions in the correct way (no concurrent access to same hardware ressource from multiple threads/ISR). And therefore all additional overhead in this function would be also unnecessary.

    So __HAL_LOCK should never return HAL_BUSY because the calling HAL function will be aborted with this error and further following errors!

    A loop would indicate the problem at the debugging immediately (causing a deadlock on concurrent access):

    #define __HAL_LOCK(__HANDLE__) \
    do{ \
    	if((__HANDLE__)->Lock != HAL_LOCKED) \
    	{ \
    		(__HANDLE__)->Lock = HAL_LOCKED; \
    		break; \
    	} \
    }while (1U)
     

    Explorer
    February 9, 2020

    This isn't the fix IMO.

    __HAL_LOCK should not be used by interrupt.

    Apps should be sensibly threaded, i.e. by either entirely by one thread, or (general case) if the driver/device has receive and transmit channels and receive uses interrupt, one thread does overall init and receive and the other does transmit.

    ONLY if a critical section's necessary during de/re-init, it should be plainly written and visible in the code, not hidden in a macro.

    Visitor II
    February 19, 2020

    Hello,

    This point is already on the top of the improvement that will be introduced shortly in our HAL according to HAL evolution and improvement roadmap . Note that the Lock mechanism approach will be fully reworked to distinguish between Lock process and critical section (lock resources)

    So actually, in the HAL we will apply three mechanisms :

    • State machine (Functional sequence)
    • Lock process (lock full operation, prevent spurious process overlap on the same hardware instance or sub instance)
    • critical section : prevent two or more ‘threads’ accessing the same data in parallel

    Thus the old HAL_Lock/Unlock macros will be reworked and extended. Keep tuned!

    Rds

    Maher

    Explorer
    February 20, 2020

    The lock mechanism is a protection against incorrect implementation?

    Don't lose sight of its purpose and that developers are asking too for a leaner/faster HAL.

    Developers want means to disable locks and argument checking.

    In Cube perhaps it'd be configured in the HAL Settings pane in Project Manager.

    It's macro, e.g. HAL_DISABLE_CHECKS, default 0 (checks are on), would be added to stm32h7xx_hal_conf.h.

    Visitor II
    February 21, 2020

    That's a very good idea with a #define to adjust the behavior of __HAL_LOCK/__HAL_UNLOCK mechanism.

    Several levels from aborting (current behavior), waiting/blocking, complete disable etc. would be conceivable.

    It's on the developer to implement a correct application structure and avoiding concurrent access to same hardware.

    There are several synchronisation mechanisms (e.g. flags and semaphores) to deserialize hardware access in one task.

    __HAL_LOCK/__HAL_UNLOCK should help to find the problem in application structure but should not try to solve it automatically.

    PiranhaAuthor
    Graduate II
    February 19, 2020

    The comments from @SStor​ and @alister​ rises another general question - which HAL functions are allowed to be called from interrupt context? Obviously the HAL_PPP_IRQHandler() is allowed because that is it's sole purpose, but what about other functions? The guys assume that no other function is (intended to be) allowed, but is that true? I can't find clear explanation anywhere. For example, let's look at UM1905. The "Figure 7: HAL driver model" at page 60 seems to show that HAL_PPP_Process() can be called from interrupt context. A bit lower is the text:

    Basically, the HAL driver APIs are called from user files and optionally from interrupt handlers file when the APIs based on the DMA or the PPP peripheral dedicated interrupts are used.

    This says that it's allowed but doesn't specify which ones. @MMAST.1​, someone from ST should clarify this here in a forum and also document it explicitly!

    PiranhaAuthor
    Graduate II
    February 19, 2020

    Related topics...

    https://community.st.com/s/question/0D50X0000C8eyb9SQA/bug-the-hallockx-macro-needs-a-lock

    This topic cites HAL code comment, which shows that HAL_UART_DMAStop() is intended to be called from interrupt context:

    https://community.st.com/s/question/0D50X0000C3CcW6SQK/can-we-call-hallock-within-interrupt-

    Visitor II
    February 19, 2020

    All HAL APIs can be called from ISR context except polling process ones like HAL_UART_Transmit as they are in polling model. This is said the HAL is built on a full process model (start --> End of operation) with an asynchronous notification in case of DMA/IT model thus those processes are protected through either state machine or lock mechanism while the process is busy. once a process has finished or forced to abort it is unlocked and other process on the same resources (Instances, Channel..etc) can take place. processes can be launched of course in parallel when they acts on different instances or sub-process (channel, endpoint, Tx/Rx transmission/reception...etc)

    Rds

    Maher

    Explorer II
    October 10, 2020

    The API that use lock can't be called from interrupt context, as lock is implemented with a RTOS semaphore.

    A semaphore can be released from an IT but can't be taken (an IT can't be delayed).

    The HAL_UART_DMAStop source code clearly explain this:

    /* The Lock is not implemented on this API to allow the user application

       to call the HAL UART API under callbacks HAL_UART_TxCpltCallback() / HAL_UART_RxCpltCallback():

       when calling HAL_DMA_Abort() API the DMA TX/RX Transfer complete interrupt is generated

       and the correspond call back is executed HAL_UART_TxCpltCallback() / HAL_UART_RxCpltCallback()

       */

    For example in UART HAL the following functions use lock:

    HAL_UART_RegisterCallback

    HAL_UART_UnRegisterCallback

    HAL_UART_Transmit_IT

    HAL_UART_Receive_IT

    HAL_UART_Transmit_DMA

    HAL_UART_Receive_DMA

    HAL_UART_DMAPause

    HAL_UART_DMAResume

    HAL_LIN_SendBreak

    HAL_MultiProcessor_EnterMuteMode

    HAL_MultiProcessor_ExitMuteMode

    HAL_HalfDuplex_EnableTransmitter

    HAL_HalfDuplex_EnableReceiver

    So most of the API is not usable from interrupt.

    Perhaps the HAL API should report an error if such an API is called from an interrupt context

    Some RTOS rises an assert when a forbiden API is used in interrupt context: The HAL could build on that.

    Visitor II
    February 20, 2020

    Thanks @MMAST.1​. I'm glad to hear this issue will be fixed soon. I noticed ST was working on it back in 2016, so I assume it's being rolled up with another major release.

    https://community.st.com/s/question/0D50X00009XkeOGSAZ/questions-surrounding-hallock

    Cheers

    David

    Visitor II
    February 20, 2020

    Thanks for the heads-up @Piranha​ .

    Visitor II
    March 18, 2020

    Hello,

    an overview on the lock mechanism coming soon in STM32Cube Package :

    1- Critical section : The critical section mechanism is based on the use of the stack and the restore primask mechanism instead of enabling IRQs on the Exit CS phase.

    Typical use of this method is illustrated in the pseudo code below:

    HAL_StatusTypeDef HAL_PPP_Process (PPP_HandleTypeDef *hppp, __PARAMS__)

    {

    __HAL_ENTER_CRITICAL_SECTION();

     /* Protected resources */

     __HAL_EXIT_CRITICAL_SECTION();

    }

    The Enter/Exit CS functions are implemented macros in the stm32ynxx_hal_def.h file as follows for both bare metal and RTOS cases:

    #if (USE_RTOS == 1)

     #define __HAL_ENTER_CRITICAL_SECTION() OsEnterCriticalSection()

     #define __HAL_EXIT_CRITICAL_SECTION()  OsExitCriticalSection()

    #else

    #define __HAL_ENTER_CRITICAL_SECTION()     \

       uint32_t PriMsk;                       \

       PriMsk = __get_PRIMASK();              \

         __set_PRIMASK(1);                    \

    #define __HAL_EXIT_CRITICAL_SECTION()      \

       __set_PRIMASK(PriMsk);                 \

    #endif

    2- Lock mechanism : ,The lock object is an entity allocated in the peripherals drivers handles and defined for each standalone process, for full duplex processes with simultaneous transfer, 2 lock objects shall be used. For peripheral with sub instances (Channels, Endpoints….etc) a lock object per sub-instance shall be defined. 

    a lock macro is used before starting any process as follows :

    HAL_StatusTypeDef HAL_PPP_Process (PPP_HandleTypeDef *hppp, __PARAMS__, uint32_t Timeout)

    {

     uint32_t tickstart = HAL_GetTick();

     if(__ARGS__ == WRONG_PARAMS)

     {

       hppp->ErrorCode = HAL_PPP_ERROR_PARAM;

       return HAL_ERROR;

     }

     if(HAL_Lock (hppp->iLock) == HAL_LOCKED)

     {

       return HAL_BUSY;

     }

    (...)

    Lock methods for ARMv7/ ARMv8

    /**

     * @brief  Attempts to acquire the lock.

     * @param  lock   Pointer to variable used for the lock.

    * @details This in an interrupt safe function that can be used as a mutex.

               The lock variable shall remain in scope until the lock is released.

               Will not block if another thread has acquired the lock.

     * @returns HAL_LOCKED if everything successful, HAL_UNLOCK if lock is taken.

     */

    __STATIC_INLINE HAL_LockStateTypeDef HAL_Lock(__IO uint32_t *lock)

    {

       do {

           /* Return if the lock is taken by a different thread */

           if(__LDREXW(lock) != HAL_UNLOCKED) {

               return HAL_LOCKED;

           }

           /* Attempt to take the lock */

       } while(__STREXW(HAL_LOCKED, lock) != 0);

       /* Do not start any other memory access until memory barrier is complete */

       __DMB();

       return HAL_UNLOCKED;

    }

    /**

     * @brief  Free the given lock.

     * @param  lock   Pointer to variable used for the lock.

     */

    __STATIC_INLINE void HAL_UnLock(uint32_t *lock)

    {

       /* Ensure memory operations complete before releasing lock*/

       __DMB();

       *lock = HAL_UNLOCKED;

    }

    Lock methods for ARMv6

    /**

     * @brief  Attempts to acquire the lock.

     * @param  lock   Pointer to variable used for the lock.

     * @details This in an interrupt safe function that can be used as a mutex.

               The lock variable shall remain in scope until the lock is released.

               Will not block if another thread has acquired the lock.

     * @ returns HAL_LOCKED if everything successful, HAL_UNLOCK if lock is taken.

     */

    __STATIC_INLINE HAL_LockStateTypeDef HAL_Lock(__IO uint32_t *lock)

    {

     uint32_t oldvalue;

     __HAL_SAVE_PRIMASK();

     __HAL_ENTER_CRITICAL_SECTION();

     oldvalue = *lock;

     if(*lock == HAL_UNLOCKED) 

     {                              

       *lock = HAL_LOCKED;                            

     }

     __HAL_EXIT_CRITICAL_SECTION();   

     return (oldvalue);  

    }

    /**

     * @brief  Free the given lock.

     * @param  lock   Pointer to variable used for the lock.

     */

    __STATIC_INLINE void HAL_UnLock(__IO uint32_t *lock)

    {

       *lock = HAL_UNLOCKED;

    }

    the above implementations are used in non RTOS env. when RTOS is used (USE_RTOS), the lock is simply a semaphore take (unlock = semaphore release)

    this way, when a process is locked in RTOS env. the current process is pended till the semaphore is freed, then the process resume once the semaphore is released.

    Rds

    Explorer
    March 19, 2020

    Hi @MMAST.1​ 

    Thanks for posting the update and the opportunity to review It here.

    There are some areas needing some more work. Please accept my comments constructively….

    This is a summary of the LDREX and STREX instructions:

    1. The LDREX syntax (simplified): LDREX Rt, [Rn], performs these steps:
      1. Loads the data (Rt) from memory address (Rn)
      2. Set the exclusive access tag
    2. The exclusive access tag is cleared by:
      1. Exception entry or exit, or
      2. Executing STREX or CLREX
    3. The STREX syntax (simplified): STREX Rd, Rt, [Rn], performs these steps:
      1. If the exclusive access tag is set
        1. Store the data (Rt) to memory address (Rn)
        2. Assign status (Rd) = 0
      2. Else
        1. Assign status (Rd) = 1
      3. Clear the exclusive access tag
    4. There is an exclusive access tag per address for memories with a Shared TLB attribute and a single exclusive access tag for all other memories.

    For the MCUs equipped with the LDREX and STREX instructions, this is what your HAL_Lock function does:

    1. Loop
      1. Read value from lock’s memory address
      2. Set the exclusive access tag
      3. If the value != HAL_UNLOCKED 
        1. Return HAL_LOCKED
      4. If the exclusive access tag is set, 
        1. Write HAL_LOCKED to lock’s memory address
        2. Status = 0
      5. Else
        1. Status = 1
      6. Clear the exclusive access tag
      7. Break if status == 0
    2. Data memory barrier

    These are its outcomes:

    1. If lock was already HAL_LOCKED or the caller’s thread is pre-empted and another thread takes the lock, then return HAL_LOCKED to indicate a different thread has the lock, i.e. the peripheral’s busy.
    2. Otherwise, loop until it’s atomically changed lock from HAL_UNLOCKED to HAL_LOCKED, then return HAL_UNLOCKED to indicate it’s successfully taken the lock. Predominant case is the thread is not pre-empted and so the loop is not taken.

    PROBLEM #1. The HAL_Lock function’s detail description “The lock variable shall remain in scope until the lock is released�? is incorrect. It either obtains the lock for its thread or it detects another thread has it.

    PROBLEM #2. The HAL_Lock function’s returns description is incorrect/inaccurate, and the HAL_UNLOCKED and HAL_LOCKED returns do not describe the function’s operation well and so a casual reader might incorrectly assume it is only reading the lock. It would read better if it returned HAL_LOCKED if it obtained the lock and HAL_BUSY otherwise.

    PROBLEM #3. For the MCUs without LDREX and STREX instructions, the HAL_Lock function would execute faster and the code would be smaller if “if(*lock == HAL_UNLOCKED)�? were replaced with “if(oldvalue == HAL_UNLOCKED)�?. Remember *lock is volatile and the compiler has to load it from memory again. But you have already read it to oldvalue which could be a register, and would still be smaller code if it were local, and you have already entered a critical section.

    REQUEST #1. Please add a method to turn off HAL’s locks.

    I layer my apps so the calls of each peripheral drivers are single-threaded, or each direction is single-threaded if the peripheral supports simultaneous receive and transmit, and so my apps never see a busy error. If my app needed to output from more than one task, those tasks send to a task dedicated to the peripheral (or its output channel if it is duplex) where it is queued and started it as soon as the last output finishes or immediately if no output is in progress. Similar for receive, if my app needs to send received data to different tasks, a task dedicated to the peripheral would interrogate the data or check an application mode (with suitable protection) to determine where and forward it.

    In summary, I design my apps to always work correctly.

    Further, my company choose the smallest/cheapest part to do a job. So I want to save easy cycles.

    Your method to disable HAL locks might be like this:

    1. In the HAL Settings pane of Cube’s Project Manager/Code Generator screen, add a check box labelled: “Enable Lock checking�? and default it Enabled.
    2. Add a macro assigned the state to the stm32h7xx_hal_conf.h file it generates.
    3. Refactor the HAL code to use the macro.

    REQUEST #2. Please add a method to turn off HAL’s parameter checking. I’ve debugged. I’m accepting the MCU may be struck by a sub-atomic particles. I accept the risks. Please turn them off the same way as the HAL’s locks.

    I do not have a good grasp why HAL locks are necessary. But clearly they are, else other developers would be asking for ways to turn them off too.

    THOUGH #1. What does a task dedicated a peripheral or one of its channels look like? As example, this is one of my go-to methods for a dedicated task to handle a peripheral’s transmit channel:

    1. A call-based function enqueues the output. Queue may be OS or not. If not, mutex if protection is necessary. Dimensioned queue per throughput and burstiness. Block the caller or drop on abnormal queue full per requirements. Notify the task if no output is in progress, with critical section to avoid race.
    2. A transmit complete interrupt notifies the task.
    3. On notify, the task starts next output.

    THOUGHT #2. Does HAL have locks only because we can’t engineer our apps properly?

    If you develop apps with more than one thread accessing a peripheral or one of its channels, I’ll poke with some tongue-in-cheek questions…

    1. What is your go-to plan for its throwing BUSY? Just drop? I hope not.
    2. So you have decided BUSY is normal. You are designing your app for it. What are your go-to methods?
    3. Do you wait-retry until not BUSY? You should not do that in interrupt.
    4. Does the peripheral driver need to be called from interrupt? If that were true and it throws BUSY, do you pole and burn cycles there? Or, because it had to be started form interrupt, do you start a timer as delay and retry from the timer’s interrupt?
    5. Or as workaround, if BUSY is thrown in interrupt, do you work-around by notifying a task to retry? But if you did that, why do you not reduce your app’s paths of execution and make it only notify, and then the BUSY wouldn’t occur?

    Post your thoughts.

    Visitor II
    March 19, 2020

    Thanks a lot Alister, actually your feedback are more than appreciated. as I mentionned the listing I write is an overview of the update. we will take care of your feedback to improve the mechanism. thanks again

    PiranhaAuthor
    Graduate II
    April 1, 2020

    In addition to what Alister said...

    #define __HAL_ENTER_CRITICAL_SECTION() \
     uint32_t PriMsk; \

    This will get you in a trouble if the lock/unlock will be necessary multiple times in a single function/block. Also it's not clear what __HAL_SAVE_PRIMASK(); does if __HAL_ENTER_CRITICAL_SECTION(); also does the same. Probably something like this should be introduced and put at the top of the function:

    #define __HAL_DECLARE_CRITICAL_SECTION() uint32_t PriMsk

    Or another simpler solution is to implement one global critical section nesting counter, as it's done in FreeRTOS, for example.

    > when a process is locked in RTOS env. the current process is pended till the semaphore is freed

    So for a non-RTOS environment HAL_Lock() would be non-blocking and returning BUSY, but for RTOS it would be blocking and not failing. That is inconsistent and leads to confusion and significantly different usage in each case. I mostly agree to Alister that the HAL_Lock() is unnecessary and damaging. And yes - there is no really a sane scenario for what to do when HAL_Lock() returns BUSY anyway. Managing access to a peripheral is a task for a higher platform layer code, not the driver.