Skip to main content
Associate II
June 17, 2025
Solved

STM32H7 JPEG encoder MCU blocks - corrupted JPEG image

  • June 17, 2025
  • 2 replies
  • 586 views

I have an issue to understand the MCU blocks for JPEG encoder. I have an stm32h743 MCU which is connected to a video decoder with dcmi interface. In ram I have the captured 8-bit ITU-R BT.656 YCrCb 4:2:2 output. Saving the captured data from ram to a debug file, I can see the image is captured properly. The byte stream looks like: Cb0-1, Y0,Cr0-1, Y1, Cb2-3, Y2, CR2-3, Y3 .... total 720Y and 360CB and 360 CR, total 1440 bytes + 8 byte blanking per line. And I have 240 lines so the resolution that I am capturing is 720x240.

As described in the AN4996, each 4:2:2 MCU contains 256 bytes organized as two 8×8 Y blocks plus one 8×8 Cb block plus one 8×8 Cr block. For a 720×240 image, I should need 45 horizontal MCUs × 30 vertical MCUs = 1,350 total MCUs. My confusion is the two Y blocks. I implemented the two Y blocks as:

Y1 first row:
y0, y1 ... y7
Y1 second row:
y16, y17...y23
Y2 first row:
y8, y9, ... y15
Y2 second row:
y24, y25 .. y31
Is this correct?

So Y1 contains the first 8 pixel columns, and the Y2 contains the second 8 pixel columns?

Here is my code that processes the ycrcb byte stream:

#define MCU_BLOCK_ROWS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_COLS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_SIZE (MCU_BLOCK_ROWS * MCU_BLOCK_COLS) // 8x8 block = 64 bytes
#define MCU_SIZE 256
__attribute__((section(".axiram"))) static uint8_t mcuBuffers[2][MCU_SIZE]; 
 
static void ExtractMcuToBuffer(uint8_t* src, uint8_t* dest, uint32_t block_cnt) {
 uint8_t* y1 = dest; // Y block 1
 uint8_t* y2 = dest + 64; // Y block 2
 uint8_t* cb = dest + 128; // Cb block
 uint8_t* cr = dest + 192; // Cr block
 // 720x240 → 720 / 16 bytes, 240/8 bytes -> 45x30 = 1350 MCUs
 uint32_t mcuRowIdx = block_cnt / 45;
 uint32_t mcuColIdx = block_cnt - (mcuRowIdx * 45);
 //the number of bytes in one line, left offset + right offset + width = 1576,
 // 4 byte was 1 pixel with this offset
 uint32_t lineOffset = 1448; //how much bytes are in a line -> determined by trial and error, this is how many samples are captured by dcmi, including blanking
 // Optimized MCU extraction for 4:2:2 UYVY format
 uint32_t rowIdx = 0;
 uint32_t colIdx = 0;
 uint32_t byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) //we need to skip MCU_BLOCK_ROWS of lines times mcuRowIdx
 		+ (lineOffset * 0) //we need to do offset lines based on the row idx, in the begining we are at 0 line, for second line of MCU we need to get the second line of the ycrcb buffer
			+ mcuColIdx * 32; // we need an offset based on in which MCU block we are working on, 0, 1... One MCU block consist of 32 bytes,16xY, 8xCr and 8xCb 32 x 45 -> 1440 bytes, max(mcuColIdx) = 44, 32 * 44 + 32 -> 1440 samples
 uint32_t* crycby = (uint32_t*)&src[byteOffset];
 for(int i = 0; i < MCU_BLOCK_SIZE;i++) {
 
 uint8_t Y1 = (uint8_t) ((*crycby & 0x000000FF) >> 0);
 uint8_t Cr = (uint8_t) ((*crycby & 0x0000FF00) >> 8);
 uint8_t Y2 = (uint8_t) ((*crycby & 0x00FF0000) >> 16);
 uint8_t Cb = (uint8_t) ((*crycby & 0xFF000000) >> 24);
 crycby++; //increase four bytes in the address
 *cr++ = Cr; // cr
 *cb++ = Cb; // cb

 //building Y1
 if(colIdx < MCU_BLOCK_COLS/2) {
 *y1++ = Y1; //yn
 *y1++ = Y2; // yn+1
 } else {//building Y2
 *y2++ = Y1; //yn
 *y2++ = Y2; // yn+1
 }
 if(colIdx < MCU_BLOCK_COLS - 1) {
 colIdx++;
 } else { //here we switch line
 colIdx = 0;
 rowIdx++;
 byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) + rowIdx * lineOffset + mcuColIdx * 32;
 crycby = (uint32_t*)&src[byteOffset];
 }
 }
}

The block_cnt goes from 0 to 1349.

With this code, I got this jpeg image:

robbits_0-1750143337325.png

 

The file header, resolution looks okay, but the content is corrupted. And I am not sure why.

Any idea?

Best answer by rob-bits

I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

	ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
	// Critical: Clean cache after buffer generation
	SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);

2 replies

Technical Moderator
June 17, 2025
"To give better visibility on the answered topics, please click on ""Accept as Solution"" on the reply which solved your issue or answered your question.Saket_Om"
rob-bitsAuthor
Associate II
June 18, 2025

Hello @Saket_Om 

 

Thanks, I have already tried to interpret the example codes for my case. However in the JPEG_Encode_DMA() funciton, the MCU blocks are created with pRGBToYCbCr_Convert_Function(), which might call the JPEG_ARGB_MCU_YCbCr422_ConvertBlocks() fun. However I have YCrCb data. I do not have RGB. I do not want to do any conversion, just encode it into JPEG. As I understand properly, the YCrCb is the format that is needed for JPEG. So please guide me, how to resolve this issue. Do you have an example/tutorial for a YCrCB 4:2:2 input?

Here is the code that you suggested:

uint32_t JPEG_Encode_DMA(JPEG_HandleTypeDef *hjpeg, uint32_t RGBImageBufferAddress, uint32_t RGBImageSize_Bytes, uint32_t *jpgBufferAddress )
{
 pJpegBuffer = jpgBufferAddress;
 uint32_t DataBufferSize = 0;

 /* Reset all Global variables */
 MCU_TotalNb = 0;
 MCU_BlockIndex = 0;
 Jpeg_HWEncodingEnd = 0;
 Output_Is_Paused = 0;
 Input_Is_Paused = 0;

 /* Get RGB Info */
 RGB_GetInfo(&Conf);
 JPEG_GetEncodeColorConvertFunc(&Conf, &pRGBToYCbCr_Convert_Function, &MCU_TotalNb);

 /* Clear Output Buffer */
 Jpeg_OUT_BufferTab.DataBufferSize = 0;
 Jpeg_OUT_BufferTab.State = JPEG_BUFFER_EMPTY;

 /* Fill input Buffers */
 RGB_InputImageIndex = 0;
 RGB_InputImageAddress = RGBImageBufferAddress;
 RGB_InputImageSize_Bytes = RGBImageSize_Bytes;
 DataBufferSize= Conf.ImageWidth * MAX_INPUT_LINES * BYTES_PER_PIXEL;

 if(RGB_InputImageIndex < RGB_InputImageSize_Bytes)
 {
 /* Pre-Processing */
 MCU_BlockIndex += pRGBToYCbCr_Convert_Function((uint8_t *)(RGB_InputImageAddress + RGB_InputImageIndex), Jpeg_IN_BufferTab.DataBuffer, 0, DataBufferSize,(uint32_t*)(&Jpeg_IN_BufferTab.DataBufferSize));
 Jpeg_IN_BufferTab.State = JPEG_BUFFER_FULL;

 RGB_InputImageIndex += DataBufferSize;
 }
...

You can see, it is for RGB images.

Thanks!

Rob

 

rob-bitsAuthorBest answer
Associate II
June 18, 2025

I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

	ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
	// Critical: Clean cache after buffer generation
	SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);