Skip to main content
Senior
September 26, 2025
Solved

How to fix audio stuttering issues in this implementation?

  • September 26, 2025
  • 3 replies
  • 1010 views

Hi all,

So, after almost half a year of arduous blood, sweat and tears I have managed to scrape together "something that works". It consists of a custom schematic and PCB powered by an STM32F042C6T6 and an Si4705 radio tuner (from Skylabs).

The STM32 uses the internal clock circuitry that is configured as depicted here. The CRS sync source is not shown here, but it is set to USB, so that the chip runs completely without any external crystals. The USB 2.0 is full-speed since the STM32 is only full-speed:

ankes_0-1758923920851.png

The Si4705 currently uses a precise, external 32,768 kHz crystal oscillator (SG-3030CM from Seiko) that only requires a by-pass capacitor. This crystal oscillator is not connected to the STM32 in any way.

The tuner chip has a prescaler and precise REFCLK adjustment so it can, if required, support a frequency range from 31130 Hz up to 140,89 MHz. Currently these settings are in their default values due to the crystal oscillator.

The STM32 commands the Si4705 via I2C, without DMA, and this communication works just fine. The relevant settings are shown below:

ankes_1-1758924307349.png

The STM32 also uses I2S and DMA to transfer data, and its settings are as shown here:

ankes_2-1758924374394.png

There are also several other settings that are related, and which can all be adjusted:

  • The Si4705 has a programmable sampling rate and bit resolution for I2S, and they are currently set to 44117 Hz (as per "Real Audio Frequency" from the STM32's I2S) and 16 bits (per channel)
  • The DMA buffer in STM32 is 352 bytes long
  • The I2S DMA transfer is initiated with value of 176 "16-bit data lengths" (whatever that means in reality)
  • There is a single EP1 IN that uses double-buffering, and the two PMA areas involved are both 176 bytes each
  • At half-complete and full-complete DMA interrupt the STM32 transfers either 176 bytes from the start of the DMA buffer, or 176 bytes from the half-point of the buffer
  • The USB Audio 1.0's Audio Type I Format is 2 channels, sub-frame size of 2, bit resolution of 16 and frequency of 44,1 kHz
  • The USB Audio's Audio Streaming Endpoint uses 176 bytes as max packet size, and interval of 1

With these in place, I have managed to obtain "almost acceptable" audio from the tuner device, recorded it, and attached both an MP3 and a WAV sample here.

However, I am not exactly sure which of all the various settings I should try to adjust in order to overcome the clear skipping/stuttering that the audio has. I have tried fiddling with all the above settings but no matter what I try I only seem to get worse results. Clearly I do not fully understand how all these various settings work together to produce the end result.

The code and schematics are available in https://github.com/anttikes/usb-fm-radio

If I look at the Wireshark output I see an almost-clear recurring phenomenon: the device first sends approximately 22 frames with 1760 packet data length (10 microframes), then several frames with 1056, 880 or 1584 data lengths, before resuming with 1760 ones. Even these "shorter" packets claim to have 10 microframes in then. This repeats around every 22-24 frames but sometimes there's up to 30 "clean" frames before the shorter ones occur again.

This looks like a a bus saturation issue to me, with the host side unable to keep up. What kind of things I could do in order to try solving this? I know I am not far off from the result.

Best answer by ankes

After some creative tinkering with the clock settings shown below

ankes_0-1759518995595.png

I managed to obtain an exact 48 kHz "real audio frequency" for the I2S. This has reduced stuttering quite a bit but I still hear occasional blips here and there.

I believe the only way to get even cleaner sound is to go lower-level and ditch HAL. I am aware that the "real audio frequency" might not be exactly what I get, and this may contribute to the overall scheme of things but I am running low on flash & RAM when using HAL like this so I'd need to do it anyway.

However, it seems the "low-level USB" and "HAL PCD" in the STM32F0 HAL driver are intertwined in a way which doesn't seem obvious at first. For example, the "HAL_PCD_EP_Transmit" function makes a call to "USB_EPStartXfer" which makes perfect sense (HAL -> LL -> Registers) but then inside the "USB_EP_StartXfer" there are calls to things like "PCD_GET_ENDPOINT" and "PCD_CLEAR_BULK_EP_DBUF" which clearly reside in the "HAL side of things", despite the fact that it's just a macro defined in a header file, and not "source code", per se. I would've put it into "stm32f0xx_ll_usb.h" and named it differently.

But that's a topic for a different question I guess.

3 replies

AScha.3
Super User
September 27, 2025

Hi,

I think you have a synchronization problem, because: who is the master, giving the clock rate for the i2s? The STM CPU internal pll clock.

But as this is USB device, the master clock for the USB is coming from the PC!

So you get a big problem: you can't change the PC to device timing, but you have to adjust the I2S sampling rate dynamically to fit it to the average data rate that is coming from the PC.

Something like resampling....

Is needed.

"If you feel a post has answered your question, please click ""Accept as Solution""."
ankesAuthor
Senior
September 27, 2025

Thanks @AScha.3,

I clarify that this is a radio tuner setup. The radio chip is tuned to an FM station, receives audio transmission, ADC's it internally, and provides that data via I2S to the STM32. The tuner chip can be configured with either 8-, 16-, 20- or 24-bit resolution and programmable sampling frequency between 32000 and 48000 samples per second. The I2S on the STM32 only supports 16-, 24- and 32-bit resolutions, although I am aware that I could do some re-sampling during the I2S interrupt routines if I wanted to e.g. use the 8-bit resolution on the tuner chip.

As far as I know, USB Audio 1.0 doesn't seem to provide any sort of "rate feedback" data from the audio source to the host PC. This means that even if I were to e.g. split the I2S WS signal into a timer on the STM32 (in addition to the tuner chip itself), and this way keep accurate track of the "real sampling frequency" then there's no way for me to report this to the host.

I am also considering an approach where I would use a HSE crystal on the STM32, enable the MCO (or a timer), and use that to provide a reference clock for the tuner chip. This would prevent any "clock drift" between the STM32 and the SG-3030CM but I do not know if this would even help with the problem.

The tuner chip also supports analog output so I could, if I wanted to, create a new PCB where the analog audio traces go to the STM32 and its ADC is then used to digitize the data. This would entirely remove any clock drift between the chips.

However, my first task at hand is to identify the root cause of the problem. Why is the host receiving smaller-than-defined packages every 24-30 frames in Wireshark? I have tried switching USB ports and using a different cable but this did not affect the outcome.

AScha.3
Super User
September 27, 2025

Puh, at first: can you set the radio chip to do 48k/16b stereo? 

As this is the standard for Windows or Linux mixer's and working fine, have to send every 1ms a packet with 48 sample per channel, so 48x2x2 bytes each block and it matches the 48k 16bit stereo perfect.

Remaining problem is how to sync the radio chip to this.....afaik the most simple way is: you get the set 48k rate from the radio, but write at the rate the USB is requesting the 96 words to the USB send buffer, no matter, you have really 48 sample ready or one less or one too much.

So this is the resampling for the poor, just double one sample to get the buffer full or cut one and throw it away. So next samples will match the USB speed, until the drift of the clocks will need correction again.

"If you feel a post has answered your question, please click ""Accept as Solution""."
ankesAuthor
Senior
September 27, 2025

I have managed to progress further on this matter.

Initially I had the DMA buffer sized so that it was able to hold 2 ms worth of audio data (352 bytes), and then at HT and TC interrupts I would send the first half, and then the second half. The USB Audio endpoint wMaxPacketSize was set accordingly to 176 bytes, and a bInterval value of 0x01.

I then made an adjustment so that the DMA buffer is now 4 ms long (705 bytes) and at each interrupt I again transfer half. The USB Audio endpoint wMaxPacketSize was sized up accordingly to 352 bytes and bInterval raised to 0x02.

This reduced the stutter to almost imperceptible, as shown in the attached to samples.

In Wireshark, the packet is now staying near-constant 1760, with an occasional blip down to 1408 (which is then audible as a stutter). I must be getting close. If only I just could understand better what the system is doing as a whole...

ankes_0-1758971879023.png

ankes_1-1758971897149.png

 

 

ankesAuthorBest answer
Senior
October 3, 2025

After some creative tinkering with the clock settings shown below

ankes_0-1759518995595.png

I managed to obtain an exact 48 kHz "real audio frequency" for the I2S. This has reduced stuttering quite a bit but I still hear occasional blips here and there.

I believe the only way to get even cleaner sound is to go lower-level and ditch HAL. I am aware that the "real audio frequency" might not be exactly what I get, and this may contribute to the overall scheme of things but I am running low on flash & RAM when using HAL like this so I'd need to do it anyway.

However, it seems the "low-level USB" and "HAL PCD" in the STM32F0 HAL driver are intertwined in a way which doesn't seem obvious at first. For example, the "HAL_PCD_EP_Transmit" function makes a call to "USB_EPStartXfer" which makes perfect sense (HAL -> LL -> Registers) but then inside the "USB_EP_StartXfer" there are calls to things like "PCD_GET_ENDPOINT" and "PCD_CLEAR_BULK_EP_DBUF" which clearly reside in the "HAL side of things", despite the fact that it's just a macro defined in a header file, and not "source code", per se. I would've put it into "stm32f0xx_ll_usb.h" and named it differently.

But that's a topic for a different question I guess.