Skip to main content
Visitor II
March 20, 2025
Question

STM32H743 realtime audio processing with DSP

  • March 20, 2025
  • 3 replies
  • 4101 views

Hi there,

this is my first post here and also the first time that I mess around with STM32 processors.
A few months ago I decided to start a new project for realtime audio processing with DSP
and the STM32H7 series seemed to me a good candidate for what I want to do.
So I bought a development board with STM32H743, a couple flash chips and a few CS4272 codecs.
I started the whole project, i wired everything and started writing the firmware.
I used as a guide a few youtube videos and soon I came up with the base code. 
The problem is that maybe I have some buffer synchronization issues because I get distorted sound.
Here are some details about the project. Clock set at 480 MHz, the CS4272 is set as standalone in slave mode and connected with I2S. The audio frequency is set at 48Khz and the dataframe is 24bits at 32 bit. I have verified that all clocks are correct. The levels of the audio signal are within specs. I use circular buffer with DMA in a array of words.
At first I tried to add some reverberation on the audio signal and that worked better than what I expected. The problem is when I try to do a IIR convolution on the signal. This is where I start to hear the distorted sound. I have tried lowering the buffer lengths and also the IIR buffer but nothing really changes.
Can someone guide me with troubleshooting this? I am using STM32CubeIDE with ST-LINK debugger.

Regards.

    This topic has been closed for replies.

    3 replies

    Graduate II
    March 20, 2025

    No specific problem description (means: no source code), so only general advice possible:

    • do not only "hear" sound, check with scope and audio analyser (some freeware PC audio stuff is usually good enough)
    • do not input complex audio, start with sine only (see above, you actually don't want to hear 1 kHz all day :D )
    • make sure you cleanly get out what you put in without any signal processing (sine generator -> codec -> STM32 -> no DSP -> codec -> analyser)
    • measure the time your convolution algorithm takes (use the ARM cycle counter), maybe the algorithm is too slow ***, compare to DMA buffer size x sampling period
    • make sure you are using the floating point unit

    *** the H7 is pretty powerful, but all the stuff people want from audio DSPs these days, I'd say some dedicated audio DSP with lots of "hardware accelerators" might be better for the job.

    psychegrAuthor
    Visitor II
    March 20, 2025

    Ok here is some code from what I have till now.

    #define FILTER_TAP_NUM 256
    #define BUFFER_SIZE		2048
    #define SAMPLING_FREQUENCY_HZ	48000.0f
    
    __attribute__((aligned(32))) int32_t adcData[BUFFER_SIZE];
    __attribute__((aligned(32))) int32_t dacData[BUFFER_SIZE];
    
    static volatile __attribute__((aligned(32))) int32_t *inBufPtr;
    static volatile __attribute__((aligned(32))) int32_t *outBufPtr;
    
    static float firdata [FILTER_TAP_NUM];
    static int firptr [FILTER_TAP_NUM];
    static int fir_w_ptr = 0;
    
    float Calc_FIR (float inSample) {
    	float inSampleF = inSample;
    	float outdata = 0;
    
    	for (int i = 0; i < FILTER_TAP_NUM; i++) {
    		outdata += (firdata[i]*cabinetIR[firptr[i]]); // cabinetIR has the FFT 
    		firptr[i]++;
    	}
    
    	firdata[fir_w_ptr] = inSampleF;
    	firptr[fir_w_ptr] = 0;
    	fir_w_ptr++;
    	if (fir_w_ptr == FILTER_TAP_NUM) fir_w_ptr=0;
    
    	return outdata;
    }
    
    void Process_HalfBuffer() {
    	// Input samples
    	static float leftIn		= 0.0f;
    	static float leftProcessed	= 0.0f;
    
    	// Loop through half of audio buffer (double buffering), convert int->float, apply processing, convert float->int, set output buffers
    	for (uint16_t i = 0; i < (BUFFER_SIZE/2); i += 2) {
    
    		/*
    		 * Convert current input samples (24-bits) to floats (two I2S data lines, two channels per data line)
    		 */
    		// Extract 24-bits via bit mask
    		inBufPtr[i]		&= 0xFFFFFF;
    		inBufPtr[i + 1]	&= 0xFFFFFF;
    
    		// Check if number is negative (sign bit)
    		if (inBufPtr[i] & 0x800000) {
    			inBufPtr[i] |= ~0xFFFFFF;
    		}
    
    		if (inBufPtr[i + 1] & 0x800000) {
    			inBufPtr[i + 1] |= ~0xFFFFFF;
    		}
    
    		// Normalise to float (-1.0, +1.0)
    		leftIn = (float) inBufPtr[i] / (float) (0x7FFFFF);
    
    		/*
    		 * Apply processing
    		 */
    		//leftProcessed = leftIn; // Passthru
    		//leftProcessed = (1.0f - wet) * leftIn + wet * Do_Reverb(leftIn); // Reverb
    		leftProcessed = Calc_FIR(leftIn);
    		leftProcessed *= 1.5f; // Volume Adjust
    
    		// Ensure output samples are within [-1.0,+1.0] range
    		if (leftProcessed < -1.0f) {
    			leftProcessed = -1.0f;
    		} else if (leftProcessed > 1.0f) {
    			leftProcessed = 1.0f;
    		}
    
    		// Scale to 24-bit signed integer and set output buffer
    		outBufPtr[i]	 = (int32_t)(leftProcessed * 0x7FFFFF);
    	}
    	dataReadyFlag = 0;
    }
    
    void HAL_I2SEx_TxRxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
    {
    	inBufPtr = &(adcData[0]);
    	outBufPtr = &(dacData[0]);
    
    	//Process_HalfBuffer();
    
    	dataReadyFlag = 1;
    }
    
    void HAL_I2SEx_TxRxCpltCallback(I2S_HandleTypeDef *hi2s)
    {
    	inBufPtr = &(adcData[BUFFER_SIZE/2]);
    	outBufPtr = &(dacData[BUFFER_SIZE/2]);
    
    	//Process_HalfBuffer();
    
    	dataReadyFlag = 1;
    }
    
    int main(void)
    {
    
     /* USER CODE BEGIN 1 */
    	
     /* USER CODE END 1 */
    
     /* MPU Configuration--------------------------------------------------------*/
     MPU_Config();
    
     /* MCU Configuration--------------------------------------------------------*/
    
     /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
     HAL_Init();
    
     /* USER CODE BEGIN Init */
    
     /* USER CODE END Init */
    
     /* Configure the system clock */
     SystemClock_Config();
    
     /* USER CODE BEGIN SysInit */
    
     /* USER CODE END SysInit */
    
     /* Initialize all configured peripherals */
     MX_GPIO_Init();
     MX_DMA_Init();
     MX_SPI4_Init();
     MX_I2S3_Init();
     /* USER CODE BEGIN 2 */
     
     /* USER CODE END 2 */
    
     /* Infinite loop */
     /* USER CODE BEGIN WHILE */
     
     CS4272_Init();
     HAL_I2SEx_TransmitReceive_DMA(&hi2s3, (uint16_t *)dacData, (uint16_t *)adcData, BUFFER_SIZE);
     while (1)
     {
    	 if(dataReadyFlag) {
    		 Process_HalfBuffer();
    	 }
    
     /* USER CODE END WHILE */
    
     /* USER CODE BEGIN 3 */
     }
     /* USER CODE END 3 */
    }

     

    This is the cleaned up code that does the audio processing. The truth is that I need to measure the "time cost" of the Calc_IR() function. Also I send a photo of the oscilloscope with 1Khz sine wave. You will notice that the yellow has some noise (maybe high frequency) but you need to notice the "glitches". It is like the buffer is misalligned or something. I cant understand why this happens. I even tried Overlap-save code with arm_copy_f32(), arm_rfft_fast_f32(), arm_cmplx_mult_cmplx_f32() functions from CMSIS library but I had no luck.

     

    20250320_195256.jpg

     

    Graduate II
    March 21, 2025

    So, what I said in the first post, the H7 is not made for that.

    50k cycles is just too many, but maybe you can optimize your code, maybe using integer math is good enough.

    But do you at least get out what you put in without any processing?

    psychegrAuthor
    Visitor II
    March 21, 2025

    If you notice on the code above, lines 62 and 63 are different tests. The first one is a passthru. The second one is a reverb effect. Both of these work perfect! The problem is with the test in line 64. That one distorts the sound. I have a F411 nucleo here somewhere and I will test the code on it to see how it functions.

    Graduate II
    March 21, 2025

    Then compare the function Do_Reverb() to your FIR filter concerning execution time.

     

    I'm pretty sure that the cause for the glitches is the setting of the I2S buffer pointers by the interrupts:

    - the processing starts in main loop with the data ready flag

    - while the "too long" processing takes place, the I2S (or SAI) interrupt sets the buffer pointers -> glitch

    My guess is that the problem is gone with lower FIR filter taps.

     

    And I would try to measure cycle counts for the complete signal processing, including the float conversions.