Hello @lyannen,
I am not really familiar with pkl file, but here is what I understood after a bit of research.
You should extract the mapping from your pkl file, and then export these mappings as flat arrays or C-style dictionaries (directly write them into a .h file), something like:
(this code is AI generated)
import pickle
with open("mymapping.pkl", "rb") as f:
mappings = pickle.load(f)
char_to_index = mappings["char_to_index"]
index_to_char = mappings["index_to_char"]
# Export as C arrays
with open("char_map.h", "w") as f:
f.write("const char* index_to_char[] = {\n")
for i in range(len(index_to_char)):
f.write(f' "{index_to_char[i]}",\n')
f.write("};\n")
f.write("const int char_to_index[128] = {\n") # assuming ASCII range
for i in range(128):
char = chr(i)
idx = char_to_index.get(char, -1)
f.write(f" {idx}, // '{char}'\n")
f.write("};\n")
So that you can:
- Before feeding input into the model: convert your text to int indices using char_to_index.
- After inference: convert model outputs back to characters using index_to_char.
If you use X Cube AI and the Application template. In the main.c, in the Cube_MX_AI_Run() (the name could be different, I don't remember exactly). You have a pre and post process functions. You should edit these template function to do your processing here.
I don't know how you manage tokenizing; you may also have to re-implement minimal tokenizer logic in embedded C.
Did you try to convert your model with the st edge ai core? I know that seq model could face issue during the conversion. Did it work for you?
Have a good day,
Julian