Skip to main content
svogl
Associate III
February 3, 2026
Solved

STM32N6 SD-Card write performance (storing videos..)

  • February 3, 2026
  • 2 replies
  • 512 views

Dear all, 
We are using the STM32N6570-DK to encode a video stream (800x600px) and store that at a reasonable data rate to an external sd-card.

I have extended the VENC_SDCARD_ThreadX example to a certain extent to cover the obvious timing differences between the video capture loop and a disk write loop. The sample code is a great way to go forward, but at the moment we reach a data write rate of about 25kb/s when using FileX.

We have initially used the HAL implementation but I have changed the fx_stm32_sd_driver_glue.c code slightly to use the BSP layer for card speed negotiation that is not present in the HAL layer; debug output shows that we're at least using 4-bit wide data transfers>:

CardInfo type 1, version 1, class 1461, spd 0
Inst PWR 00000013 CLKCR 00004004 

 

It seems that just before reaching down to the HAL layer, the single transaction (~200blocks at once, typically), is split  into single sector writes in the sd_write_data function (fx_stm32_sd_driver.c): 

svogl_0-1770112989628.png

Is there a reference implementation showing how this can be set up in the right way? 

Being in very early project stages, I can subsitute filex with fatfs or jump to zephyr if this can be done with reasonable effort.

Btw, the current state is here: https://github.com/svogl/venc-sdcard-threadx/tree/feature/filex-bsp-integration

Some implementation notes:
The BSP layer (stm32n6570_discovery_sd.c) seems to be lacking an implementation of the 1V8 voltage switch present on the board (i.e. HAL_SD_DriveTransceiver_1_8V_Callback); I have implemented this in fx_stm32_sd_driver_glue.c 

Thanks for your help with this great board,

Simon

 

Best answer by svogl

Responding to myself,
as a counter-proof and work-around, we have implemented a memory fifo buffer that accumulates subsequent (file-wise unaligned) writes and triggers DMA'able requests. This effectively eliminates this bottleneck, from an low-level OS developer's view, I would have expected this to be present in the driver stack.

As we do this for fun AND business I am happy to share the code but would like to have a solution that works for the rest of the developer community.
Simon

2 replies

AScha.3
Super User
February 4, 2026

Hi,

to get useful speed for video data , the sd-card should be written in 16 KB ...64KB blocks .

Basically every disc/file  "handling system" should do this....so if you have FileX working now,

look at the settings (in Cube) first, maybe there is something that sets max. block size to 512 ;

check and change the cache settings...give it BIG buffers. Try...generate code and check;

if still on single block writes , why not change this yourself...if its working fine then.

If you prefer to move to FATFS or Zephyr...i cannot promise it will have instantly big block writes, never tried.

Or on FileX : ->

Key Capabilities and Requirements
  • Logical Sector Cache: FileX uses a logical sector cache to group multiple small writes into a single larger transfer to the media driver. The efficiency of this is determined by the memory size allocated during the fx_media_open call.
  • Driver Support: While FileX handles the filesystem logic, the ability to perform a true multi-block write (sending multiple 512-byte sectors in one SD command) depends on your specific SDIO/SDMMC driver. Most standard implementations (like those for STM32 or Renesas RA) support this to improve throughput.
  • Performance Optimization: For maximum speed, ensure your write size is a multiple of the sector size (512 bytes) and that your source buffer is aligned on a long-word boundary. Writing large, aligned groups of sectors (up to 32 KB or more) significantly reduces the overhead caused by SD card internal "housekeeping". 
 
Configuration Tips
  • Buffer Size: Use a write buffer that is a multiple of 512 bytes.
  • Write Latency: Be aware that SD cards can have random write latency spikes (up to 250ms) due to internal wear leveling; use ThreadX FIFOs to buffer data if you are logging in real-time.
  • API Usage: Use the standard fx_file_write function; FileX will automatically manage the sector accumulation based on your cache settings.
"If you feel a post has answered your question, please click ""Accept as Solution""."
svogl
svoglAuthor
Associate III
February 4, 2026

Hi, thanks for coming back to this on short notice. We have been working from the example code on, there is no CubeMX support file, but I think we have enabled the right settings in the code.

If you could have a glimpse at the stack trace, you can see that the application code sends 100-300 sectors in one go (160 in this example) that are passed up through the filex stack as expected.
It is the stm-provided fx_stm32_sd_driver.c file that splits it into single-buffer writes in a loop, which is not what it should be doing. Could you point us to the latest version of that file?

Also, for the N6, I don't see a specific ThreadX support package - which one would be the one to use from:

svogl_0-1770211127912.png

?

Thanks a lot

Simon

 

AScha.3
Super User
February 4, 2026

Hi,

as the N6xx is M55 core - this should be the package you need.

(Dont see it in your Cube pic...so look on git -> M55)

https://github.com/STMicroelectronics/stm32-mw-threadx

+

see example with filex -> on  N6570-DK

https://github.com/STMicroelectronics/STM32CubeN6

https://github.com/STMicroelectronics/STM32CubeN6/tree/main/Projects/STM32N6570-DK/Applications/FileX

 

-- but look: is there multi/block write - or not. 

+

I looked on my H743 project, using FATFS : same as you found, multi sector called, but then 1 sector writes called:

901: disk_write(fs->drv, fs->win, wsect, 1);

ed

BUT i found: multi sector read+write in ff.c : cc is sector count :

if (disk_write(fs->drv, wbuff, sect, cc) != RES_OK) ABORT(fs, FR_DISK_ERR)

btw I only doing fast read, thats working, fast as it should; so i never looked before, how its doing write...

So if you dont need RTOS , i would try just using FATFS.

+

 Ai -> its possible:

FileX on STM32 (including N6xx) manages multi-sector writes by enabling efficient block-based access through the
fx_media_write API, which inherently handles multiple sectors when writing large, contiguous buffers, optimized via the low-level driver (e.g., eMMC or SDMMC). 
Key Implementation Details for Multi-Sector Write 
  • Buffer Size: To optimize, use a write buffer that is a multiple of the sector size ( 512  bytes).
  • FileX API: Utilize fx_file_write to write data. If the data size exceeds a single sector, the underlying driver (fx_stm32_sd_driver) handles the multi-sector transfer automatically.
  • Driver Optimization: Ensure the fx_stm32_sd_driver uses DMA to handle the multi-sector write to improve performance. 
Conceptual Code Structure (app_filex.c) 
c
/* USER CODE BEGIN PV */FX_FILE my_file;
uint8_t write_buffer[512 * 4]; // Buffer for 4 sectors (2KB)ULONG bytes_written;/* USER CODE END PV */

// ... Inside a thread ...
// 1. Open filefx_file_create(&sdio_disk, "multi_sec.bin");
fx_file_open(&sdio_disk, &my_file, "multi_sec.bin", FX_OPEN_FOR_WRITE);

// 2. Perform multi-sector writefx_file_write(&my_file, write_buffer, sizeof(write_buffer), &bytes_written);

// 3. Close filefx_file_close(&my_file);
Configuration Notes 
  • Ensure FX_MAX_SECTOR_FLUSH in fx_user.h is configured appropriately for high-speed writing.
  • The fx_stm32_sd_driver must be correctly configured to handle DMA data transfer for maximum efficiency. 

So seems the fast multi-sector write is only done with DMA (and some extra settings "...flush" ).

 

"If you feel a post has answered your question, please click ""Accept as Solution""."
svogl
svoglAuthor
Associate III
February 4, 2026

Well,  I did speed tests with a bare-metal +  fatfs implementation earlier on, I got results in the range of 500 - 700 kb/s, still a factor 10 away from what would be expected/needed.

As I see it, the single-block writes to the sdcard keep the card controller busy & block the overall write performance. Who wrote the original stm driver code - maybe he has an idea?

Thanks for looking into this,

Simon

AScha.3
Super User
February 4, 2026

Hmm...what is your "700 kb/s" ? kbit/s ?

I did only read, test it also : got about 16MB/s (Mbyte/s) , no DMA used. (Didnt get DMA working :(  )

SD unit at 100MHz clock (div 1 set), 4 bit mode. (Otherwise no hi speed anyway.)

What you set/use ? 

"If you feel a post has answered your question, please click ""Accept as Solution""."
svogl
svoglAuthor
Associate III
February 5, 2026

Hi AScha.3,

Read is easy - writing is a fundamentally different operation as the SDCard controller needs to write to flash blocks.

I have isolated the behaviour in a minimally changed example code based on Fx_uSD_File_Edit, plz find the code here (CubeMX generated, no other edits except starting the filex demo code):

https://github.com/svogl/stm32-fx-usd-file-edit

the code replicates the VENC behavior - write an mp4 header first (45 bytes), then write video packets to file, found in app_filex.c lines 290-294 (buffers dma-aligned ~lines 78). 

put a breakpoint at fx_stm32_sd_write_blocks, you can see single-block writes all over; commenting out the header part gives one big DMA write as expected. 

The file system code is fed with dma-aligned buffers; I would have expected that the filesystem copies to its internal sector cache and propagates that to the write function in as big blocks as possible? Apparently this is not happening. A

Simon