A guide to the HAL of the AES accelerator, or how to fix it
Hi,
I am writing this both as a guide to other people struggling with their STM32 AES accelerator, and as a suggestion to STM how to improve their HAL so it is easier to use. My testing and debugging was done on a STM32WB, but i think other accelerators on STM32 chips have similar properties. I used a STM32L4A6 before, and its accelerator seemed to have the same properties.
First, these are the badly documented properties of the HAL you might have problems with:
1:
When setting the DataWidthUnit field to use byte buffers, this only changes the processing of the data itself. Key and IV are still expected as word buffers.
2:
The accelerator is only able to process data that has a size of a multiple of 4 bytes. The HAL will not throw any errors when passing an invalid data size, but the result will not be a valid AES result.
3:
When using an AES mode using any initial vector, it might seem like the HAL always expects 4 words / 16 bytes as IV, but this is not true. Actually, it expects 3 words / 12 bytes, and the last word always must be set according to the reference manual. For AES-CTR it must be set to 0x00000001, for AES-GCM it must be set to 0x00000002. So the example code generated by CubeMX (with key and IV set to 0) will never produce valid results, because it breaks the assumption of the HAL that the last word of the IV is set properly.
Update for 3: AES-CTR actually seems to work with 128 bits IV. I previously only had tested AES-GCM, and found that the specification is correct there. So I assumed that the specification is also correct for AES-CTR, but that does not seem to be the case. For details, see the comments below.
Now to the improvements to the HAL that could be made here:
In general I think it should not be necessary to read the chip's reference manual to use HAL code. The HAL code should be documented well enough that the reference manual is not necessary. Also, when the HAL does not throw any error, then no user will expect that the result from the HAL is actually invalid.
1:
In general I am fine with this property of the HAL, but it could be documented more clearly. However, adding code to convert from byte buffers to word buffers, when writing the buffers into the registers, would highly improve usability because key and IV are usually stored as byte buffers. This code also should be relatively small, and at least smaller than converting the data to word buffers before calling the HAL.
2:
Here the minimal solution would be to at least document in the HAL that sizes are expected to be multiples of 4 bytes, for example in the documentation of the HAL_CRYP_Encrypt function. But checking the input data size also should be implemented, as no one will expect to receive invalid data when the HAL says HAL_OK.
3:
When only three words of the IV can actually be used, then why do you let the user supply 4 words? Also, why is CubeMX generating example code which will never generate valid results? I would suggest shortening the IV to 3 words. Then the user can set those 3 words, and the HAL would fill in the correct value for the last word. But again, at least document the assumptions of the HAL.
TL;DR:
The HAL of the AES accelerator has multiple badly documented or undocumented properties. It should not be necessary to read the reference manual to know how to use HAL code, so please, STM, improve the HAL. To STM, feel free to contact me for more details.
