DDR5: What is On-Die ECC?

DRAM Modules2023-08-07

                                              

As dynamic random access memory (DRAM) capacity increases with each generation and wafer lithography shrinks to achieve higher speeds and better economies of scale, bit errors are also expected to increase.

On-die ECC is an important feature of DDR5. It provides additional protection by correcting bit errors within the DRAM chip before sending data to the central processing unit (CPU).

What is On-Die ECC? Is It the Same as Traditional ECC?

Error Correction Code (ECC) is one of most common ways of protecting data integrity by handling memory errors such as bit-flipping caused by cosmic rays or thermal challenges. ECC memory typically detects double-bit errors and corrects single-bit errors that can occur while data is being transmitted between the controller and the memory. While on-die ECC also ensures higher reliability and reduces defect rates, it is not the same as “traditional” ECC.

Here are some things to note about what it is and what it’s not:

  1. It is not a replacement for DIMM-wide or side-band ECC but is designed to improve the reliability of individual chips. This means that on-die ECC has more to do with reliability than data retention.
  2. As memory cells become denser, they become more vulnerable to bit flips. This leads to cells holding less charge. On‑die ECC is a way of managing the bit flips so more cells pass the validation method during the production or manufacturing stage. This allows more cells to reach JEDEC-required specifications.
  3. It provides protection for data on the die but not for data in transit or data that is being moved to/from the memory module and the CPU/GPU. It does not provide protection against errors happening outside the chip, such as those that occur on the memory module and memory controller that is in the CPU.
Why Does DDR5 Feature On-Die ECC?

As process node improved from 2x to more advanced 1-alpha (1α), which is the 4th generation of the 10 nanometer (nm) technology, the chip dimensions became smaller, allowing DDR5 to achieve higher speeds and higher capacities. With the smaller process technology, however, the likelihood of single-bit errors on the DRAM memory arrays also increased.

The shrinking lithography allows the package size to accommodate higher densities, and thus meet the bigger space requirements for the additional on-die ECC spare data space. DDR5 DRAMs allocate additional storage for the ECC function: 8 additional bits of ECC storage for every 128 bits of data.

During a write operation, the DRAM internally computes the ECC and stores the ECC code within this additional storage space. During a read operation, the DRAM reads both the actual data and the ECC code. If it detects any discrepancies, the single-bit error is corrected, further protecting data from single-bit errors inside the memory array.

What are the Benefits of On-Die ECC?
  • Economies of scale. It allows more scaling so memory manufacturers like ATP Electronics to produce higher-density memory with higher reliability and at lower costs. By correcting single-bit errors internally, on-die ECC allows for more reliable technology node scaling; hence, it is expected to further reduce costs.
  • It further improves data integrity. By performing correction during Read commands and before outputting data from DDR5, it reduces the burden of system error correction and on-die data is further protected against single-bit errors within the memory array.
  • The error check and scrub (ECS) feature allows DDR5 memory to read internal data and write back corrected data if an error occurred, thus preventing error accumulation especially in higher-capacity DRAM modules. It records DRAM defects and provides error counts to the host for better transparency and reliability.
Conclusion

On-die ECC is an important feature of DDR5. It is, however, different from what is typically known as ECC. While traditional ECC ensures data integrity by handling memory errors while data is being moved, on-die ECC ensures higher reliability of higher-density memory and protects the data that is in the memory chip.

With smaller chip dimensions and higher densities in today’s memory modules, the shrinking lithography makes these chips more vulnerable to bit flips. On die ECC is a way of managing the bit flips so more cells pass the validation method during the production or manufacturing stage.

For more information on ATP’s DDR5 memory modules, visit the ATP website or contact an ATP Representative.

Back to Blog
Contact Us