Unexpected data corruption when camera and display both running (with SDRAM framebuffers in between)
Apologies in advance, this is a long one.
We are developing a camera application on an STM32H747. We have 32GB of SDRAM configured through FMC.
- We have a NanEyeC camera from which we are continuously acquiring data in packed bayered format via SPI, into one of three SDRAM framebuffers [A1,A2,A3] (a three-buffer system so that one is reading, one is pending, and one is writing).
- On our main loop we check to see if one of the raw camera data [A1,A2,A3] has available data and then (1) unpack and debayer it into a 320x320 RGB888 SDRAM framebuffer [B] (we want this intermediate point from which the user can save images), then runs a scaling operation to transfer it into a 460x460 (our display size) RGB888 SDRAM framebuffer [C],
- After these for loops we start a DMA2D transfer which combines [C] with overlay graphics in 460x460 ARGB4444 SDRAM framebuffer [D], and outputs the result into 460x460 RGB888 SDRAM framebuffer [E], and the DMA2D end-of-transfer interrupt initiates an LTDC-DSI operation which processes [E] into pixel commands sent to our MIPI display.
In summary: camera -> [A1,A2,A3] -> unpack and debayer -> [B] -> scale -> [C] -> DMA2D (combine with [D]) -> [E] -> LTDC+DSI processes and sends to display
This all works, mostly, except for a horizontal band of flickering pixels across the middle of the display which we call "disco". I created tools to pull data from any of the framebuffers and view it, so that we can figure out which link in the chain is causing the problem.
Here's where it gets weird. Disco appears in *every* framebuffer, [A1,2,3], [B], [C], and [E], which suggests that the camera is generating the issue -- which would be weird because this camera hardware+driver was proven out before. BUT, when I turn off of the DMA2D->LTDC+DSI->display sequence, so that it is only [A1,2,3]->[B]->[C] as fast as possible, I see the problem *nowhere*, disco is completely gone from all framebuffers [A],[B],[C] which suggests that camera and unpacking/debayering/scaling are fine. Until I turn display back on.
So the display processing is causing the issue, but the display processing doesn't even *touch* the upstream framebuffers [A] and [B] in which the problem appears. So... what's happening?
The only two theories I have are (1) hardware, the display wires are electrically affecting the camera wires or (2) the addition of the display processing prevents the camera from properly writing to SDRAM, maybe overloading SDRAM?. Note that caching is enabled on our SDRAM MPU region, and we are cleaning the cache with SCB_CleanDCache_by_Addr on the source buffer before [A] to [B] and [B] to [C] and before lighting off [C] -> DMA2D.
Additional observation: the "disco" starts to recede, then go away, and we display correctly if I slow the unpack/debayer/display sequence *way* down.
Is it possible for us to overload the SDRAM via reading with DMA2D, exceed its bandwidth or simply disrupt concurrent writes, to produce data corruption? How should we avoid this? Are there any other reasons this might be happening?
Thanks for any ideas -- this was a lot of work to have it glitch somewhere, feels kind of like a needle in a haystack.
