SD-Card Read Issue
Hi Community,
We are encountering an issue with the SD-Card on our STM32MP15x board. The problem is, that writing to the SD card works well but the reading does show sporadic errors.
In our configuration, we use 1TB MicroSD-Cards in the DDR50 Mode. In DDR25 mode, the SD card is working fine. Unfortunately, we need the DDR50 Mode (or faster) for our product.
First, let me describe the test scenario: I wrote ~500GB uint64_t ramp data in little endian byte order to a SD-Card, and verified on an external device, that the ramp data have been written correctly. A dump of the first 64 bytes looks as follows:
root@gdl1-31:~# hexdump -C -n 64 /dev/mmcblk0
00000000 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000010 02 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 |................|
00000020 04 00 00 00 00 00 00 00 05 00 00 00 00 00 00 00 |................|
00000030 06 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 |................|
00000040
In the next step, I wrote a little test program in C validating the ramp data on our STM32MP15x board. The program outputs random validation errors where the read ramp value differs from the expected one. I could identify, that these errors occur in blocks of 1044480 bytes (1020 KiB). After that block, the read ramp values match the expected ones again (The expected value is a monotonic counter incremented for each read uint64_t value even during validation errors).
Here is an example excerpt of the validation output (the values are dumped in big endian for better readability):
[...]
Progress 2.1% (10.2 GB / 480.0 GB)
Progress 2.2% (10.8 GB / 480.0 GB)
[read address] [expected value] [read value] <--- columns
000000009218FFB8 [exp] 00 00 00 00 52 43 1F F7 [read] 00 00 00 00 52 43 1F F7
000000009218FFC0 [exp] 00 00 00 00 52 43 1F F8 [read] 00 00 00 00 52 43 1F F8
000000009218FFC8 [exp] 00 00 00 00 52 43 1F F9 [read] 00 00 00 00 52 43 1F F9
000000009218FFD0 [exp] 00 00 00 00 52 43 1F FA [read] 00 00 00 00 52 43 1F FA
000000009218FFD8 [exp] 00 00 00 00 52 43 1F FB [read] 00 00 00 00 52 43 1F FB
000000009218FFE0 [exp] 00 00 00 00 52 43 1F FC [read] 00 00 00 00 52 43 1F FC
000000009218FFE8 [exp] 00 00 00 00 52 43 1F FD [read] 00 00 00 00 52 43 1F FD
000000009218FFF0 [exp] 00 00 00 00 52 43 1F FE [read] 00 00 00 00 52 43 1F FE
000000009218FFF8 [exp] 00 00 00 00 52 43 1F FF [read] 00 00 00 00 52 43 1F FF
[Start of error section at byte address: 0x292190000]
0000000092190000 [exp] 00 00 00 00 52 43 20 00 [read] 20 30 05 94 02 00 00 00
0000000092190008 [exp] 00 00 00 00 52 43 20 01 [read] 20 30 05 A4 02 00 00 00
0000000092190010 [exp] 00 00 00 00 52 43 20 02 [read] 20 30 05 B4 02 00 00 00
0000000092190018 [exp] 00 00 00 00 52 43 20 03 [read] 20 30 05 C4 02 00 00 00
0000000092190020 [exp] 00 00 00 00 52 43 20 04 [read] 20 30 05 D4 02 00 00 00
0000000092190028 [exp] 00 00 00 00 52 43 20 05 [read] 20 30 05 E4 02 00 00 00
0000000092190030 [exp] 00 00 00 00 52 43 20 06 [read] 20 30 05 F4 02 00 00 00
0000000092190038 [exp] 00 00 00 00 52 43 20 07 [read] 20 30 05 04 02 01 00 00
0000000092190040 [exp] 00 00 00 00 52 43 20 08 [read] 20 30 05 14 02 01 00 00
0000000092190048 [exp] 00 00 00 00 52 43 20 09 [read] 20 30 05 24 02 01 00 00
0000000092190050 [exp] 00 00 00 00 52 43 20 0A [read] 20 30 05 34 02 01 00 00
0000000092190058 [exp] 00 00 00 00 52 43 20 0B [read] 20 30 05 44 02 01 00 00
0000000092190060 [exp] 00 00 00 00 52 43 20 0C [read] 20 30 05 54 02 01 00 00
0000000092190068 [exp] 00 00 00 00 52 43 20 0D [read] 20 30 05 64 02 01 00 00
0000000092190070 [exp] 00 00 00 00 52 43 20 0E [read] 20 30 05 74 02 01 00 00
0000000092190078 [exp] 00 00 00 00 52 43 20 0F [read] 20 30 05 84 02 01 00 00
0000000092190080 [exp] 00 00 00 00 52 43 20 10 [read] 20 30 05 94 02 01 00 00
-------------------------------------------------
(Skipping further messages for this error section
-------------------------------------------------
000000009228EFB8 [exp] 00 00 00 00 52 45 1D F7 [read] 00 00 00 00 4F F5 EF F7
000000009228EFC0 [exp] 00 00 00 00 52 45 1D F8 [read] 00 00 00 00 4F F5 EF F8
000000009228EFC8 [exp] 00 00 00 00 52 45 1D F9 [read] 00 00 00 00 4F F5 EF F9
000000009228EFD0 [exp] 00 00 00 00 52 45 1D FA [read] 00 00 00 00 4F F5 EF FA
000000009228EFD8 [exp] 00 00 00 00 52 45 1D FB [read] 00 00 00 00 4F F5 EF FB
000000009228EFE0 [exp] 00 00 00 00 52 45 1D FC [read] 00 00 00 00 4F F5 EF FC
000000009228EFE8 [exp] 00 00 00 00 52 45 1D FD [read] 00 00 00 00 4F F5 EF FD
000000009228EFF0 [exp] 00 00 00 00 52 45 1D FE [read] 00 00 00 00 4F F5 EF FE
000000009228EFF8 [exp] 00 00 00 00 52 45 1D FF [read] 00 00 00 00 4F F5 EF FF
[End of error section at byte address: 0x29228F000]
000000009228F000 [exp] 00 00 00 00 52 45 1E 00 [read] 00 00 00 00 52 45 1E 00
[Completed error section of 1044480 Bytes (1020KiB)]
Number of error sections so far: 8
Progress 1.3% (6.1 GB / 480.0 GB)
Progress 1.4% (6.7 GB / 480.0 GB)
Progress 1.5% (7.2 GB / 480.0 GB)
[...]
The incorrect data also seem to be ramp data, but from another region of the Card. Additionally, they are not uint64_t aligned: they seem to be shifted by a number of nibbles (4-bit). This can be seen in the example, because the counting digit is the upper nibble of a byte.
Other error blocks show only a small shift regarding the expected data, like in this example:
000000006B1EBFE0 [exp] 00 00 00 00 8D 63 D7 FC [read] 00 00 00 00 8D 63 D7 FC
000000006B1EBFE8 [exp] 00 00 00 00 8D 63 D7 FD [read] 00 00 00 00 8D 63 D7 FD
000000006B1EBFF0 [exp] 00 00 00 00 8D 63 D7 FE [read] 00 00 00 00 8D 63 D7 FE
000000006B1EBFF8 [exp] 00 00 00 00 8D 63 D7 FF [read] 00 00 00 00 8D 63 D7 FF
[Start of error section at byte address: 0x46B1EC000]
000000006B1EC000 [exp] 00 00 00 00 8D 63 D8 00 [read] D8 01 00 00 00 00 8D 63
000000006B1EC008 [exp] 00 00 00 00 8D 63 D8 01 [read] D8 02 00 00 00 00 8D 63
000000006B1EC010 [exp] 00 00 00 00 8D 63 D8 02 [read] D8 03 00 00 00 00 8D 63
000000006B1EC018 [exp] 00 00 00 00 8D 63 D8 03 [read] D8 04 00 00 00 00 8D 63
000000006B1EC020 [exp] 00 00 00 00 8D 63 D8 04 [read] D8 05 00 00 00 00 8D 63
000000006B1EC028 [exp] 00 00 00 00 8D 63 D8 05 [read] D8 06 00 00 00 00 8D 63
Other observations:
- The error blocks always start at 12-bit aligned addresses (the lower 3 nibbles of the address are 0).
- The error blocks seem to occur at random addresses. When starting another validation run, the errors are at different addresses compared to the previous run.
- The frequency of the errors also seems random, sometimes 10-20 GB can be read without validation error.
- The Kernel log does not show any I/O errors or other storage related issues.
Any ideas what could go wrong? Any idea is welcome to be tested... Thanks in advance!
Best regards,
Karsten
