Skip to main content
Graduate
June 17, 2025
Solved

STM32H7 JPEG encoder MCU blocks - corrupted JPEG image

  • June 17, 2025
  • 2 replies
  • 584 views

I have an issue to understand the MCU blocks for JPEG encoder. I have an stm32h743 MCU which is connected to a video decoder with dcmi interface. In ram I have the captured 8-bit ITU-R BT.656 YCrCb 4:2:2 output. Saving the captured data from ram to a debug file, I can see the image is captured properly. The byte stream looks like: Cb0-1, Y0,Cr0-1, Y1, Cb2-3, Y2, CR2-3, Y3 .... total 720Y and 360CB and 360 CR, total 1440 bytes + 8 byte blanking per line. And I have 240 lines so the resolution that I am capturing is 720x240.

As described in the AN4996, each 4:2:2 MCU contains 256 bytes organized as two 8×8 Y blocks plus one 8×8 Cb block plus one 8×8 Cr block. For a 720×240 image, I should need 45 horizontal MCUs × 30 vertical MCUs = 1,350 total MCUs. My confusion is the two Y blocks. I implemented the two Y blocks as:

Y1 first row:
y0, y1 ... y7
Y1 second row:
y16, y17...y23
Y2 first row:
y8, y9, ... y15
Y2 second row:
y24, y25 .. y31
Is this correct?

So Y1 contains the first 8 pixel columns, and the Y2 contains the second 8 pixel columns?

Here is my code that processes the ycrcb byte stream:

#define MCU_BLOCK_ROWS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_COLS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_SIZE (MCU_BLOCK_ROWS * MCU_BLOCK_COLS) // 8x8 block = 64 bytes
#define MCU_SIZE 256
__attribute__((section(".axiram"))) static uint8_t mcuBuffers[2][MCU_SIZE]; 
 
static void ExtractMcuToBuffer(uint8_t* src, uint8_t* dest, uint32_t block_cnt) {
 uint8_t* y1 = dest; // Y block 1
 uint8_t* y2 = dest + 64; // Y block 2
 uint8_t* cb = dest + 128; // Cb block
 uint8_t* cr = dest + 192; // Cr block
 // 720x240 → 720 / 16 bytes, 240/8 bytes -> 45x30 = 1350 MCUs
 uint32_t mcuRowIdx = block_cnt / 45;
 uint32_t mcuColIdx = block_cnt - (mcuRowIdx * 45);
 //the number of bytes in one line, left offset + right offset + width = 1576,
 // 4 byte was 1 pixel with this offset
 uint32_t lineOffset = 1448; //how much bytes are in a line -> determined by trial and error, this is how many samples are captured by dcmi, including blanking
 // Optimized MCU extraction for 4:2:2 UYVY format
 uint32_t rowIdx = 0;
 uint32_t colIdx = 0;
 uint32_t byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) //we need to skip MCU_BLOCK_ROWS of lines times mcuRowIdx
 		+ (lineOffset * 0) //we need to do offset lines based on the row idx, in the begining we are at 0 line, for second line of MCU we need to get the second line of the ycrcb buffer
			+ mcuColIdx * 32; // we need an offset based on in which MCU block we are working on, 0, 1... One MCU block consist of 32 bytes,16xY, 8xCr and 8xCb 32 x 45 -> 1440 bytes, max(mcuColIdx) = 44, 32 * 44 + 32 -> 1440 samples
 uint32_t* crycby = (uint32_t*)&src[byteOffset];
 for(int i = 0; i < MCU_BLOCK_SIZE;i++) {
 
 uint8_t Y1 = (uint8_t) ((*crycby & 0x000000FF) >> 0);
 uint8_t Cr = (uint8_t) ((*crycby & 0x0000FF00) >> 8);
 uint8_t Y2 = (uint8_t) ((*crycby & 0x00FF0000) >> 16);
 uint8_t Cb = (uint8_t) ((*crycby & 0xFF000000) >> 24);
 crycby++; //increase four bytes in the address
 *cr++ = Cr; // cr
 *cb++ = Cb; // cb

 //building Y1
 if(colIdx < MCU_BLOCK_COLS/2) {
 *y1++ = Y1; //yn
 *y1++ = Y2; // yn+1
 } else {//building Y2
 *y2++ = Y1; //yn
 *y2++ = Y2; // yn+1
 }
 if(colIdx < MCU_BLOCK_COLS - 1) {
 colIdx++;
 } else { //here we switch line
 colIdx = 0;
 rowIdx++;
 byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) + rowIdx * lineOffset + mcuColIdx * 32;
 crycby = (uint32_t*)&src[byteOffset];
 }
 }
}

The block_cnt goes from 0 to 1349.

With this code, I got this jpeg image:

robbits_0-1750143337325.png

 

The file header, resolution looks okay, but the content is corrupted. And I am not sure why.

Any idea?

    This topic has been closed for replies.
    Best answer by rob-bits

    I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

    	ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
    	// Critical: Clean cache after buffer generation
    	SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);

    2 replies

    Technical Moderator
    June 17, 2025
    rob-bitsAuthor
    Graduate
    June 18, 2025

    Hello @Saket_Om 

     

    Thanks, I have already tried to interpret the example codes for my case. However in the JPEG_Encode_DMA() funciton, the MCU blocks are created with pRGBToYCbCr_Convert_Function(), which might call the JPEG_ARGB_MCU_YCbCr422_ConvertBlocks() fun. However I have YCrCb data. I do not have RGB. I do not want to do any conversion, just encode it into JPEG. As I understand properly, the YCrCb is the format that is needed for JPEG. So please guide me, how to resolve this issue. Do you have an example/tutorial for a YCrCB 4:2:2 input?

    Here is the code that you suggested:

    uint32_t JPEG_Encode_DMA(JPEG_HandleTypeDef *hjpeg, uint32_t RGBImageBufferAddress, uint32_t RGBImageSize_Bytes, uint32_t *jpgBufferAddress )
    {
     pJpegBuffer = jpgBufferAddress;
     uint32_t DataBufferSize = 0;
    
     /* Reset all Global variables */
     MCU_TotalNb = 0;
     MCU_BlockIndex = 0;
     Jpeg_HWEncodingEnd = 0;
     Output_Is_Paused = 0;
     Input_Is_Paused = 0;
    
     /* Get RGB Info */
     RGB_GetInfo(&Conf);
     JPEG_GetEncodeColorConvertFunc(&Conf, &pRGBToYCbCr_Convert_Function, &MCU_TotalNb);
    
     /* Clear Output Buffer */
     Jpeg_OUT_BufferTab.DataBufferSize = 0;
     Jpeg_OUT_BufferTab.State = JPEG_BUFFER_EMPTY;
    
     /* Fill input Buffers */
     RGB_InputImageIndex = 0;
     RGB_InputImageAddress = RGBImageBufferAddress;
     RGB_InputImageSize_Bytes = RGBImageSize_Bytes;
     DataBufferSize= Conf.ImageWidth * MAX_INPUT_LINES * BYTES_PER_PIXEL;
    
     if(RGB_InputImageIndex < RGB_InputImageSize_Bytes)
     {
     /* Pre-Processing */
     MCU_BlockIndex += pRGBToYCbCr_Convert_Function((uint8_t *)(RGB_InputImageAddress + RGB_InputImageIndex), Jpeg_IN_BufferTab.DataBuffer, 0, DataBufferSize,(uint32_t*)(&Jpeg_IN_BufferTab.DataBufferSize));
     Jpeg_IN_BufferTab.State = JPEG_BUFFER_FULL;
    
     RGB_InputImageIndex += DataBufferSize;
     }
    ...

    You can see, it is for RGB images.

    Thanks!

    Rob

     

    rob-bitsAuthorAnswer
    Graduate
    June 18, 2025

    I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

    	ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
    	// Critical: Clean cache after buffer generation
    	SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);