Skip to main content
Graduate
July 18, 2024
Solved

FMC: Data integrity in SDRAM

  • July 18, 2024
  • 4 replies
  • 2458 views

Hi everyone,

I am observing some data corruption in an external SDRAM connected to an STM32 F7 FMC peripheral and I am trying to understand what is happening and how to prevent it.

 

Scenario:

  • Configure the FMC to interface with a Winbond external SDRAM chip
  • Allocated a static memory buffer in the external SDRAM (somewhere in the 0xC000 0000 to 0xCFFF FFFF region)
  • SDRAM write 1: reset the whole buffer to a known value [memset(0xFF)] to be sure buffer contents are in a known state
  • SDRAM write 2immediately after SDRAM write 1, populate the buffer with my data of interest
  • immediately after SDRAM write 2, read the buffer and notice the data corruption. I.e., in the middle of my data of interest, some 0xFF are still present (figure bellow).

BrMel10_1-1721299239655.png

 

My theory:

  • my theory is that when we write to the memory in the external RAM, we are not actually “writing to the memory in the external RAM”. We are in fact writing to the FMC SDRAM memory mapped region (0xC000 0000 to 0xCFFF FFFF) and then, in the background, the FMC peripheral updates the external RAM... which takes some time. So when "SDRAM write 2" operation starts the FMC is still updating the SDRAM as requested in the "SDRAM write 1" step. So at this point the FMC has to simultaneously reflect in the SDRAM the "SDRAM write 1" and "SDRAM write 2" causing a concurrency problem leading the data reading showing some 0xFFs in the middle of the data of interest

 

Where I need help:

  • Is my theory / understanding described above correct? If not, what am missing?
  • If so, is there a way to make sure that the "SDRAM write 2" to the SDRAM only occurs once the "SDRAM write 1" has finished?

 

EDIT:

I forgot to mention in the initial description above that if the "SDRAM write 1" is removed, then the data read back matches the "SDRAM write 2".

 

Kind regards

    This topic has been closed for replies.
    Best answer by BrMel10

    I never discard the possibility of a bug in the code but this time it not the case.

     

    Context:

    We developed several boards with two types of SDRAMs: one 8MB and the other 32MB, but both SDRAMs share the same pinout.

    The issue only manifested itself on the boards with the smaller 8MB capacity (not on the 32MB) and two factors may have contributed to it (number 1 below definitely... number 2, not sure):

    1. Software wise: the "number of rows" and "number of columns" in the FMC configuration were set as 13 and 9, respectively, for both. This was an incorrect setting for the smaller 8MB SDRAM. Instead it should had been  12 and 8.
    2. Hardware wise: Although not needed, the most significant address pin A12 of the MCU FMC and the SDRAM were nevertheless connected/routed in the board

    Solution:

    Once the "number of rows" and "number of columns" were corrected to 12 and 8, the application worked as expected.

    Still curious to know whether it would have worked had the A12 pins not been routed/connected in the board!?

     

    Sorry for the spam and thank everyone for the support.

    Everyone's input steered me in the right direction.

    Kind regards.

    4 replies

    Graduate II
    July 18, 2024

    The write buffers aren't that deep

    The cache would be the thing implementing write-back vs write-thru the former occurring at line eviction. 

    BrMel10Author
    Graduate
    July 18, 2024

    Hi Tesla,
    sorry for my ignorance.

    If I understand you correctly, my theory is wrong and the issue I am facing is not related to concurrency / timing access from the FMC to the external SDRAM but caching issues. And also that if the caching is set to "write-through" then the issue should not be observable

    Could you share how do we control the FMC caching in the STM32 F7 MCU?

    How do we check which "write-through" vs "write-back" is set and how do we change it to "write-through"?

    Super User
    July 18, 2024

    Is this repeatable? Is the same memory region 0xFF after reading it back? If not, I would expect this is a hardware or signal integrity issue. Is this a custom board? Does not look like a cache issue because it appears in the middle of a large buffer.

    BrMel10Author
    Graduate
    July 18, 2024

    Hi TDK,

    Yes. The behavior is repeatable. When reading back the buffer, the 0xFF are always observable in the same position in the buffer.

    Yes. The board is a custom one.

     

    In an attempt to check whether it could be a cache issue, I tried to configure the FMC/SDRAM MPU region as non-cacheable, but the behavior remained the same. (For context, the application is Zephyr RTOS based and I changed this setting using Zephyr's device tree: changed "zephyr,memory-attr = <( DT_MEM_ARM_MPU_RAM )>;" to "zephyr,memory-attr = <( DT_MEM_ARM_MPU_RAM_NOCACHE)>;". I could not go into the kernel weeds to confirm if this change did what I was looking for as at the moment I am not familiar with it). So either this update didn't do what I was looking for or this is really not a cache issue.

     

    One, perhaps important, detail that I forgot to mention in my initial post (apologies for that) is that if the "SDRAM write 1" is removed, then the data read back matches the "SDRAM write 2".

     

    Have you got any suggestions on how I could continue this investigation?

     

    Super User
    July 18, 2024

    Sure it's not a code bug? Weird behavior happens after value 0x80, which is nice and even in terms of binary data. Perhaps show a complete program that exhibits the problem. I'm not sure how the OS would be involved in an SDRAM write. Surely that wouldn't be getting buffered by the OS.

    Change starting address by 4 bytes, what effect does that have?

    Also have the confounding factor that your screenshot shows "1 messages dropped". Unclear what that means. It'd be best to examine the memory directly using the debugger.

    BrMel10AuthorAnswer
    Graduate
    July 19, 2024

    I never discard the possibility of a bug in the code but this time it not the case.

     

    Context:

    We developed several boards with two types of SDRAMs: one 8MB and the other 32MB, but both SDRAMs share the same pinout.

    The issue only manifested itself on the boards with the smaller 8MB capacity (not on the 32MB) and two factors may have contributed to it (number 1 below definitely... number 2, not sure):

    1. Software wise: the "number of rows" and "number of columns" in the FMC configuration were set as 13 and 9, respectively, for both. This was an incorrect setting for the smaller 8MB SDRAM. Instead it should had been  12 and 8.
    2. Hardware wise: Although not needed, the most significant address pin A12 of the MCU FMC and the SDRAM were nevertheless connected/routed in the board

    Solution:

    Once the "number of rows" and "number of columns" were corrected to 12 and 8, the application worked as expected.

    Still curious to know whether it would have worked had the A12 pins not been routed/connected in the board!?

     

    Sorry for the spam and thank everyone for the support.

    Everyone's input steered me in the right direction.

    Kind regards.

    Super User
    July 19, 2024

    Thanks for coming back with the answer.

    Graduate II
    July 18, 2024

    As long as all memory access goes through the cache, and there is no concurrent  access (e.g. multicores) the values you "read" from "memory" should always reflect the last value written. The difference in cache strategies only determines the policy for when writes are flushed to SDRAM. There should be no observable effect on data reads from the software point of view.

    The data you read from "memory" may actually originate from the cache instead of actual SDRAM, but this is what you want, since that's exactly how caches improve performance.