Skip to main content
JCame.3
Associate
April 3, 2023
Question

Error when exporting PyTorch model "NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted"

  • April 3, 2023
  • 7 replies
  • 9245 views

I am using the STM32Cube.AI Developer Cloud to convert my ONNX model that I built using PyTorch.

Here is my export code:

input_size = [1, 8, 1000]
 
 x = torch.randn(input_size)
 
 onnx_folder_path = 'onnx_models/'
 if not os.path.isdir(onnx_folder_path):
 os.mkdir(onnx_folder_path)
 onnx_filename = "{}{}.onnx".format(onnx_folder_path, filename)
 
 torch.onnx.export(model, # model being run
 x, # model input (or a tuple for multiple inputs)
 onnx_filename, # where to save the model (can be a file or file-like object)
 # export_params=True, # store the trained parameter weights inside the model file
 opset_version=11, # the ONNX version to export the model to
 # do_constant_folding=True, # whether to execute constant folding for optimization
 input_names=['input_1'], # the model's input names
 output_names=['output_1'], # the model's output names
 )

And my model code:

class Custom1DCNN(nn.Module):
 def __init__(self, n_input=128, n_output=7, n_channel=8, pretrained=None):
 super().__init__()
 
 input_0 = n_channel
 input_1 = n_input
 input_2 = n_input // 4
 input_3 = n_input // 8
 
 self.conv1 = nn.Conv1d(input_0, input_1, kernel_size=3)
 self.bn1 = nn.BatchNorm1d(input_1)
 
 self.conv2 = nn.Conv1d(input_1, input_2, kernel_size=3)
 self.bn2 = nn.BatchNorm1d(input_2)
 
 self.conv3 = nn.Conv1d(input_2, input_3, kernel_size=3)
 self.bn3 = nn.BatchNorm1d(input_3)
 
 self.avgpool = nn.AdaptiveAvgPool1d(1)
 
 self.fc1 = nn.Linear(input_3, n_output)
 
 self.activation = nn.ReLU()
 
 if pretrained is not None:
 self.load_pretrained(pretrained)
 self.is_pretrained = True
 else:
 self.is_pretrained = False
 
 def forward(self, x):
 x = self.conv1(x)
 x = self.bn1(x)
 x = self.activation(x)
 
 x = self.conv2(x)
 x = self.bn2(x)
 x = self.activation(x)
 
 x = self.conv3(x)
 x = self.bn3(x)
 x = self.activation(x)
 
 x = self.avgpool(x)
 x = x.permute(0, 2, 1)
 x = self.fc1(x)
 x = x.flatten(1)
 x = F.softmax(x, dim=1)
 
 return x

I am getting the following error:

>>> stm32ai validate --model exported_1d_cnn_input_1000.onnx --workspace workspace --output output --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced
Neural Network Tools for STM32AI v1.6.0 (STM.ai v7.3.0-RC5)
NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted

I would appreciate guidance because this is blocking my research.

This topic has been closed for replies.

7 replies

fauvarque.daniel
ST Employee
April 3, 2023

Can you share the onnx file (even trained with random data).

Thanks a lot

JCame.3
JCame.3Author
Associate
April 3, 2023
fauvarque.daniel
ST Employee
April 3, 2023

I've reproduced your issue on the current development code and I raised a bug to the development team. I'll let you know of possible workaround

JCame.3
JCame.3Author
Associate
April 3, 2023

Thanks Daniel - a workaround would be excellent

ST Employee
September 8, 2023

Hi,

I was able to recreate your issue. and was also able to pass the model through cubeai.

The structuring of your permute, fullyconnected then flatten layer in this order was leading to the creation of a "MatMul" node in your onnx thad has 3d inputs and 2d matrices (which is probably what is causing the issue in cubeai), I was able to fix this by shifting the fullyconnected layer after the flatten layer. I'm not sure if changing the architecture like this will affect your model performance in any way you would have to check that.

Thanks for your detailed submission.

CaoJuan
Visitor II
April 18, 2023

Hi, did you solve this problem? If so, could you please share the solution, thank you very much!

wg_it
Associate II
May 4, 2023

I have the same problem. How do you solve it? Thank you very much

Associate
September 22, 2023

Have you solved this problem? if so, can you share the workaround, thank you very much

fauvarque.daniel
ST Employee
October 3, 2023

Sorry for the late answer, the problem has been fixed in X-CUBE-AI 8.1 and now the model can be analyzed and validated

Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)

Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 1, 8, 1000)/float32, min/max=[0.000, 1.000], mean/std=[0.500, 0.288], input_1
No output/reference samples are provided
Copying the AI runtime files to the user workspace: C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace

Exec/report summary (validate)
-------------------------------------------------------------------------------------
model file : C:\Users\fauvarqd\Downloads\exported_1d_cnn_input_1000.onnx
type : onnx
c_name : network
compression : lossless
optimization : balanced
workspace dir : C:\Users\fauvarqd\Downloads\stm32ai_ws
output dir : C:\Users\fauvarqd\Downloads\stm32ai_output
model_fmt : float
model_name : exported_1d_cnn_input_1000
model_hash : 294ebca9dfef0693515b87f952908f4d
params # : 17,191 items (67.15 KiB)
-------------------------------------------------------------------------------------
input 1/1 : 'input_1' (domain:user/)
: 8000 items, 31.25 KiB, ai_float, float, (1,1,8,1000)
output 1/1 : 'output_1' (domain:user/)
: 7 items, 28 B, ai_float, float, (1,7)
macc : 17,031,315
weights (ro) : 68,764 B (67.15 KiB) (1 segment)
activations (rw) : 512,160 B (500.16 KiB) (1 segment)
ram (total) : 544,188 B (531.43 KiB) = 512,160 + 32,000 + 28
-------------------------------------------------------------------------------------

Running the STM AI c-model (AI RUNNER)...(name=network, mode=x86)

X86 shared lib (C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace\lib\libai_network.dll) ['network']

Summary "network" - ['network']
----------------------------------------------------------------------------------------------
inputs/ouputs : 1/1
input_1 : input_1, (1,1,8,1000), float32, 32,000 bytes, user
output_1 : output_1, (1,1,1,7), float32, 28 bytes, user
n_nodes : 13
compile_datetime : Oct 3 2023 11:29:26
activations : 512160
weights : 68764
macc : 17031315
----------------------------------------------------------------------------------------------
runtime : STM.AI(/) 8.1.0 (Tools 8.1.0) -
capabilities : IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
device : AMD64 Intel64 Family 6 Model 142 Stepping 12, GenuineIntel (Windows)

NOTE: duration and exec time per layer is just an indication. They are dependent of the HOST-machine work-load.

STM.AI Profiling results v1.2 - network
---------------------------------------------------------------
nb sample(s) : 10
duration : 35.863ms by sample (34.324/36.488/0.565)
macc : 17031315
---------------------------------------------------------------
HOST duration : 0.379s (total)
---------------------------------------------------------------

Inference time per node
--------------------------------------------------------------------------
c_id m_id type dur (ms) % name
--------------------------------------------------------------------------
0 1 Transpose (0x10a) 0.066 0.2% ai_node_0
1 1 Transpose (0x10a) 0.062 0.2% ai_node_1
2 2 Conv2D (0x103) 7.543 21.0% ai_node_2
3 3 NL (0x107) 0.623 1.7% ai_node_3
4 4 Conv2D (0x103) 24.180 67.4% ai_node_4
5 5 NL (0x107) 0.103 0.3% ai_node_5
6 6 Conv2D (0x103) 3.215 9.0% ai_node_6
7 7 NL (0x107) 0.033 0.1% ai_node_7
8 8 Pool (0x10b) 0.023 0.1% ai_node_8
9 10 Dense (0x104) 0.004 0.0% ai_node_9
10 11 Eltwise (0x113) 0.002 0.0% ai_node_10
11 12 Transpose (0x10a) 0.001 0.0% ai_node_11
12 13 NL (0x107) 0.004 0.0% ai_node_12
--------------------------------------------------------------------------
total 35.860
--------------------------------------------------------------------------

Statistic per tensor
-----------------------------------------------------------------------------
tensor shape/type min max mean std name
-----------------------------------------------------------------------------
I.0 (1,1,8,1000)/float32 0.000 1.000 0.500 0.288 input_1
O.0 (1,1,1,7)/float32 0.118 0.168 0.143 0.020 output_1
-----------------------------------------------------------------------------

Running the ONNX model...

Saving validation data...
output directory: C:\Users\fauvarqd\Downloads\stm32ai_output
creating C:\Users\fauvarqd\Downloads\stm32ai_output\network_val_io.npz
m_outputs_1: (10, 1, 1, 7)/float64, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1
c_outputs_1: (10, 1, 1, 7)/float32, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1

Computing the metrics...

Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - data type is different: r/float64 instead p/float32
- the output of the reference model is used as ground truth/reference value
- 10 samples (7 items per sample)

acc=100.00%, rmse=0.000021865, mae=0.000015804, l2r=0.000151639, nse=100.00%, cos=100.00%

7 classes (10 samples)
-------------------------------------------
C0 0 . . . . . .
C1 . 0 . . . . .
C2 . . 10 . . . .
C3 . . . 0 . . .
C4 . . . . 0 . .
C5 . . . . . 0 .
C6 . . . . . . 0

Evaluation report (summary)
--------------------------------------------------------------------------------------------------------------------------------------------------
Output acc rmse mae l2r mean std nse cos tensor
--------------------------------------------------------------------------------------------------------------------------------------------------
X-cross #1 100.00% 0.0000219 0.0000158 0.0001516 0.0000000 0.0000220 0.9999988 1.0000000 output_1, ai_float, (1,7), m_id=[13]
--------------------------------------------------------------------------------------------------------------------------------------------------

acc : Classification accuracy (all classes)
rmse : Root Mean Squared Error
mae : Mean Absolute Error
l2r : L2 relative error
nse : Nash-Sutcliffe efficiency criteria
cos : COsine Similarity

Creating txt report file C:\Users\fauvarqd\Downloads\stm32ai_output\network_validate_report.txt
elapsed time (validate): 6.563s