Skip to main content
Explorer
March 27, 2023
Question

Optee PKCS11 TA Performance really bad!

  • March 27, 2023
  • 4 replies
  • 3210 views

Hello, I use Optee on a stm32mp157f-dk2 board, version is the and the corresponding BSP :

  • optee-os 3.16.0-stm32mp
  • u-boot v2021.10-stm32mp
  • Linux v5.15-stm32mp

All my changes are committed and built using a yocto meta-layer : https://github.com/embetrix/meta-stm32mp15x

the Optee build config is described here : https://github.com/embetrix/meta-stm32mp15x/blob/kirkstone/recipes-security/optee/optee-os-stm32mp_3.16.0.bb#L33

I enabled the PKCS11 TA which is by the way not by default enabled and gave it a try:

EC Prime256 Keypair generation:

# time pkcs11-tool --keypairgen --key-type EC:prime256v1 --label "testkeyEC" --id 1 --login --usage-sign --module /usr/lib/libckteec.so.0
Using slot 0 with a present token (0x0)
Key pair generated:
Private Key Object; EC
 label: testkeyEC
 ID: 01
 Usage: sign
 Access: sensitive, always sensitive, never extractable, local
Public Key Object; EC EC_POINT 256 bits
 EC_POINT: 044104d7506303c183c36445ef2d5161a5cfe1effaeb12a7b41ef458bc27811d2ddd915518917cd385ec3572032483a6a2efbeb539f585be9d443754862716fabc609d
 EC_PARAMS: 06082a8648ce3d030107
 label: testkeyEC
 ID: 01
 Usage: verify
 Access: local
real 1m 4.92s
user 0m 0.01s
sys 0m 31.37s

RSA 2048 Keypair generation:

# time pkcs11-tool --keypairgen --key-type RSA:2048 --label "testkeyRSA" --id 2 --login --usage-sign --module /usr/lib/libckteec.so.0 
Using slot 0 with a present token (0x0)
Key pair generated:
Private Key Object; RSA 
 label: testkeyRSA
 ID: 02
 Usage: sign
 Access: sensitive, always sensitive, never extractable, local
Public Key Object; RSA 2048 bits
 label: testkeyRSA
 ID: 02
 Usage: verify
 Access: local
real 0m 43.02s
user 0m 0.00s
sys 0m 20.82s

It take way too long for any real world application :( and strange by the way that ECC prime256 operation take longer than RSA 2048 !

For the sake of comparison I tried with the official mainline Optee build using the https://github.com/OP-TEE/manifest/blob/master/stm32mp1.xml manifest

I got much better times !

EC Prime256 Keypair generation:

# time pkcs11-tool --keypairgen --key-type EC:prime256v1 --label "testkeyEC" --id 1 --login --usage-sign --module /usr/lib/libckteec.so.0
D/TC:? 0 tee_ta_init_session_with_context:624 Re-open TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
Using slot 0 with a present token (0x0)
Key pair generated:
Private Key Object; EC
 label: testkeyEC
 ID: 01
 Usage: sign
 Access: sensitive, always sensitive, never extractable, local
Public Key Object; EC EC_POINT 256 bits
 EC_POINT: 0441045172428126d0dd3db11d2aaaaf7f7ad5fb4dddc0ad932f12145c6d42306c5a6212d71d9ab5378400c7bced1d31060b881bac7e6ebf66d88e238327920ec2f477
 EC_PARAMS: 06082a8648ce3d030107
 label: testkeyEC
 ID: 01
 Usage: verify
 Access: local
D/TC:? 0 tee_ta_close_session:529 csess 0x2ffce880 id 1
D/TC:? 0 tee_ta_close_session:548 Destroy session
real 0m 4.14s
user 0m 0.00s
sys 0m 3.96s

RSA 2048 Keypair generation:

# time pkcs11-tool --keypairgen --key-type RSA:2048 --label "testkeyRSA" --id 2 --login --usage-sign --module /usr/lib/libckteec.so.0
D/TC:? 0 tee_ta_init_session_with_context:624 Re-open TA fd02c9da-306c-48c7-a49c-bbd827ae86ee
Using slot 0 with a present token (0x0)
Key pair generated:
Private Key Object; RSA 
 label: testkeyRSA
 ID: 02
 Usage: sign
 Access: sensitive, always sensitive, never extractable, local
Public Key Object; RSA 2048 bits
 label: testkeyRSA
 ID: 02
 Usage: verify
 Access: local
D/TC:? 0 tee_ta_close_session:529 csess 0x2ffce880 id 1
D/TC:? 0 tee_ta_close_session:548 Destroy session
real 0m 15.59s
user 0m 0.00s
sys 0m 15.43s

I'm stuck with the official latest ST BSP release for u-boot, Kernel at the moment and using new mainline optee 3.20 with that I cannot even bootup the board.

ST latest Optee release is still the 3.16.0-stm32mp, so my question if they are ways to tweak optee and remove bottlenecks to obtain better PKCS11 performance ?

    This topic has been closed for replies.

    4 replies

    AZaki.2Author
    Explorer
    March 27, 2023

    I asked on the mainline Optee-OS github same question:

    https://github.com/OP-TEE/optee_os/issues/5918

    There is a related issue:

    https://github.com/OP-TEE/optee_os/issues/5915

    But this problem looks like related to STM32MP1 and ST Official BSP

    Technical Moderator
    April 3, 2023

    Hello @AZaki.2​ ,

    This PKCS11 use case performance issue is a known limitation specific to the STM32MP15.

    The performances are impacted by the OP-TEE pager, which is function of the tasks that OP-TEE has to do in parallel. As an example, this is not the case on STM32MP13 that does not work with the OP-TEE pager.

    To enter more in details, we will not have a complete workaround to upgrade performances a lot. However, reducing the workload of OP-TEE can limit the degradation. Removing the debug from OP-TEE will already increase the performances.

    If you do not use the M4 coprocessor, you can as well use the M4 dedicated memory to allocate it to OP-TEE, and then have better result concerning key generation.

    I hope that these information will help you to go forward.

    Kind regards,

    Erwan.

    AZaki.2Author
    Explorer
    April 3, 2023

    hello @Erwan SZYMANSKI​,

    luckily the M4 is not required for my application, how can I allocate the M4 memory to Optee ? is this change required only in u-boot or it's also necessary to change other BSP components ?

    Thank you for your support.

    Best reagrds

    Technical Moderator
    April 5, 2023

    Hello @AZaki.2​ ,

    You can find some information concerning memory mapping on the wiki, like on this article for example: https://wiki.st.com/stm32mpu/wiki/STM32MP15_RAM_mapping

    Just as information, I reproduced a test on STM32MP135F-DK board to have a comparison. Find below the results:

    root@stm32mp13-disco:~# 12341234 --keypairgen --label testkey --key-type EC:prime256v1

    -sh: 12341234: not found

    root@stm32mp13-disco:~# time pkcs11-tool --module /usr/lib/libckteec.so.0 --label testtoken --login --pin 12341234 --keypairgen --label testkey --key-type rsa:2048

    Using slot 0 with a present token (0x0)

    Key pair generated:

    Private Key Object; RSA 

     label:   testkey

     Usage:   decrypt, sign

     Access:   sensitive, always sensitive, never extractable, local

    Public Key Object; RSA 2048 bits

     label:   testkey

     Usage:   encrypt, verify

     Access:   local

    real  0m 5.63s

    user  0m 0.00s

    sys   0m 4.57s

    root@stm32mp13-disco:~# time pkcs11-tool --module /usr/lib/libckteec.so.0 --label testtoken --login --pin 12341234 --keypairgen --label testkey --key-type ec:prime256v1

    Using slot 0 with a present token (0x0)

    Key pair generated:

    Private Key Object; EC

     label:   testkey

     Usage:   sign, derive

     Access:   sensitive, always sensitive, never extractable, local

    Public Key Object; EC EC_POINT 256 bits

     EC_POINT:  044104179f72afa2693f3fe356fa652cd60379fb235415b19d5ed82ac82a26a91ae5ed1b24be2ceee5acff702fe77d71bbcb37438a278ebcabb69ef30ef660ac877a41

     EC_PARAMS: 06082a8648ce3d030107

     label:   testkey

     Usage:   verify, derive

     Access:   local

    real  0m 0.97s

    user  0m 0.01s

    sys   0m 0.20s

    Kind regards,

    Erwan.

    Graduate
    May 9, 2023

    How much time you are taking to generate rsa 204 bit keys in U5 with inbuilt PKA?