Skip to main content
Associate II
March 20, 2025
Question

Issue Converting tf.matmul Model to .tflite for STM32MP257F-EV1 NPU Acceleration

  • March 20, 2025
  • 2 replies
  • 1249 views

Hello everyone,

I am trying to convert a tf.matmul operation into a .tflite model and deploy it on the STM32MP257F-EV1 NPU for acceleration. My inference inputs are (1, 1, 384) and (1, 51865, 384). Below is my code:

import tensorflow as tf
import numpy as np

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

# Create a new standalone model
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=output)
matmul_model.summary()

# Define the inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32) * 3
output_embeddings = tf.random.normal((1, 51865, 384), dtype=tf.float32) * 3

# Call the model with separate arguments
output = matmul_model([hidden_states, output_embeddings])
print(output.shape)

matmul_model.save("matmul_saved_model", save_format="tf")

# Load the saved encoder model
matmul_model = tf.keras.models.load_model("matmul_saved_model")

# Function to generate representative dataset
def representative_data_gen():
for _ in range(10): # 10 samples for calibration
hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
output_embeddings = np.random.normal(size=(1, 51865, 384)).astype(np.float32)
yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Enable post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide the representative dataset for proper scaling
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False

# Ensure we use 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32

# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)

print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite("matmul_saved_model", "matmul_int8.tflite", representative_data_gen)

Following the ST Edge AI tool guide, I used the command:
./stedgeai generate --target stm32mp25 -m matmul_int8.tflite --input-data-type float32 --output-data-type float32

However, I encountered the following error:
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 17:17:28 Acuity need 2 input files, but got 1

INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

Does anyone know what might be causing this issue?
Is it possible to deploy a model with multiple inputs on the NPU?
Or am I missing something in my conversion process?

Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!





2 replies

Julian E.
Technical Moderator
April 8, 2025

Hello @Justin_wu,

Sorry for the late answer, I had big troubles with my pc so I took me quite some time to test your issue out.

 

It seems that in your code:

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

the transpose_b=true is not supported.

 

If you do a transpose to 'output_embeddings' and remove 'transpose_b', it works.

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(384,51865), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings)

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Julian E.
Technical Moderator
April 8, 2025

Hello @Justin_wu,

 

Also, we find a bug thanks to your code, please avoid naming any of your input with "output".

It should be fixed, but for now, make sure not to use it.

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Justin_wuAuthor
Associate II
April 12, 2025

Hello Julian E.,

I appreciate your reply and clarification a lot!  I follow your suggestion, change my code to the following:

hidden_states = tf.keras.layers.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.layers.Input(shape=(384, 51865), dtype=tf.float32)
result = tf.matmul(hidden_states, output_embeddings)
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=result)

# Define the dummpy inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32)
output_embeddings = tf.random.normal((1, 384, 51865), dtype=tf.float32)
result = matmul_model([hidden_states, output_embeddings])
matmul_model.save(f"matmul_saved_model", save_format="tf")

def representative_data_gen():
 for _ in range(10): # 100 samples for calibration
 hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
 output_embeddings = np.random.normal(size=(1, 384, 51865)).astype(np.float32)
 yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
 model = tf.keras.models.load_model(model_path)
 converter = tf.lite.TFLiteConverter.from_keras_model(model)

 # Enable post-training quantization
 converter.optimizations = [tf.lite.Optimize.DEFAULT]

 # Provide the representative dataset for proper scaling
 converter.representative_dataset = representative_data_gen
 converter._experimental_disable_per_channel = True
 converter._experimental_new_quantizer = False

 # Ensure we use 8-bit asymmetric quantization
 converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

 # Keep input and output in float32 for compatibility
 converter.inference_input_type = tf.float32
 converter.inference_output_type = tf.float32

 # Convert and save the TFLite model
 tflite_model = converter.convert()
 with open(output_tflite_path, "wb") as f:
 f.write(tflite_model)
 
 print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite(f"matmul_saved_model", f"matmul_int8.tflite", representative_data_gen)

 

But I still get the same error when I try to convert .tflite to .nb format by the code:

$ ./stedgeai generate --target stm32mp25 -m ./matmul_int8.tflite --input-data-type float32 --output-data-type float32

ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 14:07:53 Acuity need 2 input files, but got 1
 
INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

 

So I wonder whether the way I use to generate .tflite is wrong? Or do I use the wrong command when using ST Edge AI tool to convert .nb format?   Thanks a lot!

 

Best regards,

Justin

Justin_wuAuthor
Associate II
April 19, 2025

Hello Julian E.,

Thanks for you test again!  But it's weird that I still have problems when try to convert the matmul_int8.tflite, even though I use the .tflite in you test_user.zip.

So I wonder whether I use the tool in a wrong way or didn't install correctly. I cannot find the files in the st_ai_ws repository inside your test_user.zip, even after I convert other .tflite models correctly to .nb, those files won't be generated.

Here's how I install and use the tools in detail:

I use stedgeai-linux-onlineinstaller in the en.stedgeai-lin.zip got from https://www.st.com/en/development-tools/stedgeai-core.html to install the ST Edge AI tool, and between the installing process, it ask me to provide st neural-art archive, so I upload the en.stedgeai-stneuralart-10.0.0.zip.  

After finishing installation, I use command 

./stedgeai generate --target stm32mp25 -m ./matmul_int8.tflite --input-data-type float32 --output-data-type float32

under the path ~/stedgeai_tool/2.0/Utilities/linux

 

When I convert some model, say, model1.tflite into model1.nb, it generates ./stm32ai_ws/report_mode1.json.  It also generates mode1.nb and two empty folder /inc and /src under ./stm32ai_output.  

 

May I ask you how you get the files under your ./stm32ai_ws and ./stm32ai_output inside the test_user.zip

Maybe there is something wrong when I install the ST Edge AI tool?  Thanks!

Sorry for bothering you again.

 

Best regards,

Justin

Julian E.
Technical Moderator
April 22, 2025

Hello @Justin_wu,

 

I made a mistake sorry, I did not use the stm32mp2 as a target.

Because mcu, mpu and npu are different, the supported layers are different and you can get different error.

 

The stedge ai core for MPU is supported only on linux. I need to find a pc with linux to test it out.

I'll update you as soon as possible.

 

Have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Julian E.
Technical Moderator
April 24, 2025

Hello @Justin_wu,

So the issue is that --input-data-type and --output-data-type are not supported for stm32mp

JulianE_0-1745487733031.png

(https://stedgeai-dc.st.com/assets/embedded-docs/command_line_interface.html#ref_input_data_type_option)

 

Without these argument, the model do not create errors with the stedgeaicore

 

have a good day,

Julian

​In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.