Skip to main content
ALiss
Associate II
September 10, 2019
Solved

Spectrogram recognition with X-Cube AI on STM32F746

  • September 10, 2019
  • 8 replies
  • 5172 views

Hello,

I am doing a project where I will be implementing a trained neural network (trained with Keras) onto a STM32F746-DISCOVERY board with X-Cube AI. The goal is to train the network to recognize audio samples converted into spectrograms. This would mean that on the microcontroller, I would need to convert the audio input into spectrogram images, and then input that into the neural network for recognition.

Does anyone have any good sources or similar projects regarding either creating spectrograms on a STM32 microcontroller or a good image recognition project on an STM32 MCU using x-cube AI?

Thank you!

This topic has been closed for replies.
Best answer by Gln

​Hi @ALiss​ ,

In FP-AI-SENSING1 v3.0.0, there is an STM32_AI_AudioPreprocessing_Library Middleware library that can be used exactly for this purpose.  The library provides the building blocks for spectral analysis and feature extraction, such as:

  • Spectrogram computation
  • Mel-scaled and LogMel-scaled spectrogram computation
  • Mel-frequency cepstral coefficients (MFCCs) computation

An example (custom) usage can be found in asc_processing.c

Best regards,

Guillaume

8 replies

KnarfB
Super User
September 10, 2019

for the signal processing part you may look at X-CUBE-DSPDEMO

ALiss
ALissAuthor
Associate II
September 10, 2019

Thank you! I will look into it.

Gln
GlnBest answer
ST Employee
September 10, 2019

​Hi @ALiss​ ,

In FP-AI-SENSING1 v3.0.0, there is an STM32_AI_AudioPreprocessing_Library Middleware library that can be used exactly for this purpose.  The library provides the building blocks for spectral analysis and feature extraction, such as:

  • Spectrogram computation
  • Mel-scaled and LogMel-scaled spectrogram computation
  • Mel-frequency cepstral coefficients (MFCCs) computation

An example (custom) usage can be found in asc_processing.c

Best regards,

Guillaume

ALiss
ALissAuthor
Associate II
September 10, 2019

That is great, thank you Guillaume!

Gln
ST Employee
December 2, 2019

Hello @Gerardo Trotta​ ,

Sure. What is your issue?

Guillaume

Gerardo Trotta
Associate II
December 2, 2019

Hello @Gln​ ,

first one is in function

ASC_OutputTypeDef ASC_Run(float32_t *pBuffer)
{
 ai_float dense_2_out[AI_CONTACT_OUT_1_SIZE] = {0.0, 0.0, 0.0};
 
 /* Create a Mel-scaled spectrogram column */
 MelSpectrogramColumn(&S_MelSpectr, pBuffer, aColBuffer);
 /* Reshape and copy into output spectrogram column */
 for (int i = 0; i < NMELS; i++)
 {
 aSpectrogram[i * SPECTROGRAM_COLS + SpectrColIndex] = aColBuffer[i];
 }
 SpectrColIndex++;
 
 if (SpectrColIndex == SPECTROGRAM_COLS)
 {
 SpectrColIndex = 0;
 
 /* Convert to LogMel-scaled Spectrogram */
 PowerTodB(aSpectrogram);
 
 /* Run AI Network */
 ASC_NN_Run(aSpectrogram, dense_2_out);
 
 /* AI Network post processing */
 //ClassificationCode = ASC_PostProc(dense_2_out);
 
 return ClassificationCode;
 }
 else
 {
 return ASC_UNDEFINED;
 }

The variable :

SpectrColIndex 

is not initialized. In my H7 the reshape cycle does not work, unless SpectrColIndex  is initialized to zero, befaore calling ASC_Run.

The second one is a crash during ai_run. But this may be because I have to recalculate common tables, isn'it?

Thank you

JJ

Gln
ST Employee
December 3, 2019

SpectrColIndex is global variable, part of the bss segment. It is initialized to 0 during startup.

If needed, you can explicitly initialize SpectrColIndex to 0 in ASC_Init().

The second crash during aiRun might be caused by insufficient stack space and/or I/O buffer address corruption.

Recalculating common tables is only required if you change some preprocessing parameters and you want to avoid going through the MelFilterbank_Init, and Window_Init. The lookup tables stored in common_tables.h are for a given configuration. If you are using different preprocessing parameters, these lookup tables can be created at runtime in RAM using the _Init() functions. This is not the case in FP-AI-SENSING1. The preprocessing lookup tables have been generated offline and stored in ROM Flash using common_tables.c

Regards,

Guillaume

Gerardo Trotta
Associate II
December 3, 2019

Hello,

almost clear. Thank you.

The NN still crash in a strange point; exiting form this function :

ASC_StatusTypeDef ASC_NN_Run(float32_t *pSpectrogram, float32_t *pNetworkOut)
{
 ai_i8 AscNnOutput[AI_CONTACT_OUT_1_SIZE];
 ai_i8 AscNnInput[AI_CONTACT_IN_1_SIZE];
 
 /* Z-Score Scaling on input feature */
 for (uint32_t i = 0; i < SPECTROGRAM_ROWS * SPECTROGRAM_COLS; i++)
 {
 pSpectrogram[i] = (pSpectrogram[i] - featureScalerMean[i]) / featureScalerStd[i];
 }
 
 aiConvertInputFloat_2_Int8(AI_CONTACT_MODEL_NAME, AI_CONTACT_MODEL_CTX,pSpectrogram, AscNnInput);
 aiRun(AI_CONTACT_MODEL_NAME, AI_CONTACT_MODEL_CTX, AscNnInput,AscNnOutput);
 aiConvertOutputInt8_2_Float(AI_CONTACT_MODEL_NAME, AI_CONTACT_MODEL_CTX,AscNnOutput, pNetworkOut);
 
 return ASC_OK;
}

aiRun is excuted, but on line 16 it crash. Stack is 2000 now (before it was 400). And NN report is :

input : input_0 [121 items, 484 B, ai_float, FLOAT32]
input (total) : 484 B
output : dense_3_nl [17 items, 68 B, ai_float, FLOAT32]
output (total) : 68 B
params # : 302,689 items (1182.38 KiB)
macc : 2,378,879
rom (ro) : 1,210,756 (1182.38 KiB) 
ram (rw) wb+io : 31,936 + 552 (31.19 KiB + 552 B)

What is the laison between NN data and mimum stack?

Now I'm recalculating tables as you suggest.

Thank you

Gln
ST Employee
December 3, 2019

Hello @Gerardo Trotta​ ,

There is no direct link between the ram size reported by the CubeAI and the application stack size. However, you can use the aiSystemPerformance application in X-CUBE-AI to evaluate the stack size requirement for NN inference. See section 9.3 Embedded C-model run-time performance  in UM2526. The run-time performance will report the 'used stack'.

But of course, your project stack size requirement will be greater than the NN inference stack size. I would recommend doing a stack size analysis in your project.

For the tables, if memory is not an issue for you (the H7 has more memory than the L4), fell free to use the runtime tables generation function in ST_AI_AudioPreprocessing. For example:

#include "feature_extraction.h"
 
#define SAMPLE_RATE 16000U /* Input signal sampling rate */
#define FFT_LEN 2048U /* Number of FFT points. Must be greater or equal to FRAME_LEN */
#define NUM_FRAMES 14U /* Number of columns in spectrogram */
#define FRAME_LEN FFT_LEN /* Window length and then padded with zeros to match FFT_LEN. */
#define HOP_LEN 1024U /* Number of overlapping samples between successive frames. */
#define NUM_MELS 128U /* Number of mel bands */
 
float32_t pInBuffer[FRAME_LEN]; /* 8.0 KB */
float32_t pOutColBuffer[NUM_MELS]; /* 0.5 KB */
float32_t pOutMelSpectrogram[NUM_MELS * NUM_FRAMES]; /* 7.0 KB */
float32_t pSpectrScratchBuffer[FFT_LEN]; /* 8.0 KB */
float32_t pWindowFuncBuffer[FFT_LEN]; /* 8.0 KB */
uint32_t pMelFilterStartIndices[NUM_MELS]; /* 0.5 KB */
uint32_t pMelFilterStopIndices[NUM_MELS]; /* 0.5 KB */
float32_t pMelFilterCoefs[2020]; /* 7.9 KB */ /* Size given by S_MelFilter.CoefficientsLength */
 
/* Allocate buffers and structures */
arm_rfft_fast_instance_f32 S_Rfft; /* 24 B */
MelFilterTypeDef S_MelFilter; /* 48 B */
SpectrogramTypeDef S_Spectr; /* 28 B */
MelSpectrogramTypeDef S_MelSpectr; /* 8 B */
LogMelSpectrogramTypeDef S_LogMelSpectr; /* 16 B */
 
/*
 * Python equivalent:
 * librosa.feature.melspectrogram(y=y, sr=16000, n_mels=128, hop_length=1024, center=False)
 */
 
 
void Preprocessing_Init(void)
{
 /* Init window function */
 if (Window_Init(pWindowFuncBuffer, FRAME_LEN, WINDOW_HANN) != 0)
 {
 printf("Init error\n");
 exit(1);
 }
 
 /* Init RFFT */
 arm_rfft_fast_init_f32(&S_Rfft, FFT_LEN);
 
 /* Init Mel filter */
 S_MelFilter.pStartIndices = pMelFilterStartIndices;
 S_MelFilter.pStopIndices = pMelFilterStopIndices;
 S_MelFilter.pCoefficients = pMelFilterCoefs;
 S_MelFilter.NumMels = NUM_MELS;
 S_MelFilter.FFTLen = FFT_LEN;
 S_MelFilter.SampRate = SAMPLE_RATE;
 S_MelFilter.FMin = 0.0;
 S_MelFilter.FMax = S_MelFilter.SampRate / 2.0;
 S_MelFilter.Formula = MEL_SLANEY;
 S_MelFilter.Normalize = 1;
 S_MelFilter.Mel2F = 1;
 MelFilterbank_Init(&S_MelFilter);
 
 /* Init Spectrogram */
 S_Spectr.pRfft = &S_Rfft;
 S_Spectr.Type = SPECTRUM_TYPE_POWER;
 S_Spectr.pWindow = pWindowFuncBuffer;
 S_Spectr.SampRate = SAMPLE_RATE;
 S_Spectr.FrameLen = FRAME_LEN;
 S_Spectr.FFTLen = FFT_LEN;
 S_Spectr.pScratch = pSpectrScratchBuffer;
 
 /* Init MelSpectrogram */
 S_MelSpectr.SpectrogramConf = &S_Spectr;
 S_MelSpectr.MelFilter = &S_MelFilter;
 
}
 
void AudioPreprocessing_Run(int16_t *pInSignal)
{
 /* Create melspectrogram */
 for (uint32_t frame_index = 0; frame_index < NUM_FRAMES; frame_index++)
 {
 buf_to_float_normed(pInSignal + (frame_index * HOP_LEN), pInBuffer, FRAME_LEN);
 MelSpectrogramColumn(&S_MelSpectr, pInBuffer, pOutColBuffer);
 /* Reshape col into pOutMelSpectrogram */
 for (uint32_t i = 0; i < NUM_MELS; i++)
 {
 pOutMelSpectrogram[i * NUM_FRAMES + frame_index] = pOutColBuffer[i];
 }
 }
}

Regards,

Guillaume

Gerardo Trotta
Associate II
December 6, 2019

Hello @Gln​ .

Very clear. Thank you.

I'm debbugging now, and notice a weird situation. In aiConvertInputFloat_2_Int8,

if bufferPtr->meta_info is null, how do we transform and scale from float32 to ai_8 ?

int aiConvertInputFloat_2_Int8(const char *nn_name, const int idx,
 ai_float *In_f32, ai_i8 *Out_int8)
{
 if( AI_HANDLE_NULL == net_ctx[idx].handle)
 {
 return -1;
 }
 ai_buffer * bufferPtr = &(net_ctx[idx].report.inputs[0]);
 ai_buffer_format format = bufferPtr->format;
 int size = AI_BUFFER_SIZE(bufferPtr);
 ai_float scale ;
 int zero_point ;
 
 if (AI_BUFFER_FMT_TYPE_Q != AI_BUFFER_FMT_GET_TYPE(format) &&\
 ! AI_BUFFER_FMT_GET_SIGN(format) &&\
 8 != AI_BUFFER_FMT_GET_BITS(format))
 {
 return -1;
 }
 if (AI_BUFFER_META_INFO_INTQ(bufferPtr->meta_info)) {
 scale = AI_BUFFER_META_INFO_INTQ_GET_SCALE(bufferPtr->meta_info, 0);
 if (scale != 0.0F)
 {
 scale= 1.0F/scale ;
 }
 else
 {
 return -1;
 }
 zero_point = AI_BUFFER_META_INFO_INTQ_GET_ZEROPOINT(bufferPtr->meta_info, 0);
 } else {
 return -1;
 }
 
 for (int i = 0; i < size ; i++)
 {
 Out_int8[i] = __SSAT((int32_t) roundf((float)zero_point + In_f32[i]*scale), 8);
 }
 return 0;
}