NanoEdge AI Studio 5.0.2 high offline accuracy; poor on STEVAL-STWINBX1

Question

Dear ST Support Team,I am currently working on a binary voice classification task on the STEVAL-STWINBX1 (for example, “echo” vs “other”) using NanoEdge AI Studio 5.0.2, and I would like to ask for your advice.My workflowData collectionI use STEVAL-STWINBX1 to collect voice data.Each sample is 1 second long.I collected more than 200 samples per class.Data acquisition firmware:fp-sns-datalog2\fp-sns-datalog2\STM32CubeFunctionPack_DATALOG2_V3.1.0\Projects\STM32U585AI-STWIN.box\Applications\DATALOG2Dataset conversionI use batch_to_nanoedge.bat to batch-convert the collected files into NanoEdge-compatible format.Training in NanoEdge AI Studio 5.0.2I import the converted data into NanoEdge AI Studio 5.0.2Perform Data Management (DM)Train a classification modelMany generated models show accuracy above 97% in StudioDeployment to MCUI deploy the generated library to the MCU following this ST wiki page:https://wiki.st.com/stm32mcu/wiki/AI:How_to_create_a_current_sensing_classifier_using_NanoEdge_AI_StudioI integrate the library into:FP-AI-MONITOR2_16_3\FP-AI-MONITOR2_V1.0.0_RC8\FP-AI-MONITOR2_V1.0.0\Middlewares\ST\NanoEdge_AI_LibraryProblemAlthough the classification accuracy in NanoEdge AI Studio is very high (often >97%), the real-time classification accuracy on the MCU is very poor.My question / suspicionI suspect that the data used by NanoEdge AI Studio for training/classification may not be the same representation as the raw data sent to the NanoEdge library on the MCU. For example:Studio-side data may be normalized, orconverted using microphone sensitivity scalingwhile the MCU-side classifier may be receiving raw sensor data directly.This possible mismatch might explain the large accuracy gap between Studio and MCU deployment.QuestionsIs there any issue with the workflow I am using?For voice classification on STEVAL-STWINBX1, what is the recommended way to ensure that the training data format and MCU runtime input format are strictly consistent?Does NanoEdge AI Studio expect raw sensor samples, sensitivity-scaled values, or normalized inputs for this type of workflow?Have you seen similar cases (high Studio accuracy but poor MCU accuracy), and what are the common causes / best practices to solve them?Any guidance would be greatly appreciated.Thank you very much for your support.Best regards,

Julian E. · Answer

Hi @dzf,

Your workflow seems correct. I suspect overfitting or an issue in your firmware.

Could you please split your data in a train and a test set (70%/30%). Please shuffle them before the split.

Then run a benchmark with the train data and use the "Validation step" to test some libraries with the test data.

What could happen is that the last libraries overfit on the data making them work well on training data, but not so much on "new"/test data.

If you have bad validation results, this is probably an overfitting problem:

Try to use more data for the benchmark
Try to see if "worse libraries" or libraries found earlier in the benchmark work better.

If you have good validation results, then the problem is not coming from the library:

Make sure that the data you acquire in the firmware to run the inference are exactly the same as the ones you collected to train the model.
Make sure that your sensor is collecting data correctly.

Have a good day,

Julian

My workflow

Problem

My question / suspicion

Questions

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded