Technology enthusiasts have come across the concept of ECC (Error Correction Code) from time to time. To put it first, this technology is often used on servers and workstations, that is, in the corporate space. Error-correcting memory is being developed to automatically detect and correct errors that may occur in RAM chips.
Electronic/magnetic interference or cosmic rays can corrupt data in memory. The purpose of ECC is to correct the deteriorated data and to report it to the system if it cannot be corrected. On-die ECC (ODECC) technology, which came with DDR5 technology, caused a lot of controversy and confusion among consumers. First of all, let’s say that this technology is very different from the standard ECC technology. Now we will touch briefly on ECC, then we will talk about the differences of ODECC (ECC on chip mold).
What is ECC RAM?
Error correction code is a mathematical operation that ensures that the data stored in memory is correct. ECC also allows the system to regenerate the correct data in real-time in the event of an error.
ECC uses a more advanced parity format, which is a method of using a single bit (parity bit) to detect errors in large data sets, such as eight bits in RAM. Unfortunately, while a parity bit allows the system to detect an error, it does not provide enough information to correct the data error.
Most systems move data in larger chunks of 64 bits. Instead of producing one extra parity bit for every eight bits of data, ECC produces seven extra bits per 64-bit data. The system applies a complex mathematical algorithm to the extra seven bits of data to make sure the other 64 bits are correct. If a single bit is incorrect (a one-bit error), the ECC algorithm can regenerate the data, but only notify the system when there are larger errors (two or more bits).
What is the Difference Between ECC and On-Die ECC (ODECC)?
Unlike the standard ECC, ODECC primarily aims to improve efficiency in advanced manufacturing technologies so that cheaper DRAM chips can be produced. On-die ECC only detects errors that occur in a cell or row during refreshes. When data is moved from the cell to the cache or CPU, if there is a bit shift or data corruption, this is not corrected by the on-die ECC. Standard ECC can correct data corruption within the cell and when it is transported to another device.
DDR5 divides the memory module into two independent 32-bit addressable subchannels to increase efficiency and reduce data access latencies for the memory controller. The data width of the DDR5 module is 64 bits, that is, the same. However, when this bus is split into two 32-bit addressable channels, overall performance improves. Server-class memory (RDIMMs) provide a total of 40 bits per subchannel or 80 bits per queue, with 8 bits added to each subchannel for ECC support. The dual-row modules have four 32-bit subchannels.
On-die ECC is a new feature designed to correct bit errors in the DRAM chip. As with CPUs and GPUs, the manufacturing technologies used in the production of RAM are also evolving. As the density of DRAM chips increases with new lithography techniques, so does the potential for data leaks. ECC is integrated into DDR5 chips to correct errors within the chip, improving reliability and minimizing defect rates while reducing risk.
This technology cannot correct errors outside the chip or errors that occur in the bus between the module and the memory controller located inside the CPU. ECC-enabled processors used in servers and workstations feature encoding that can instantly correct single or multi-bit errors.
To continue, DDR5’s on-die ECC feature does not correct DDR channel errors. This means businesses will continue to use DDR5 ODECC support as well as standardized sideband ECC technology. Long story short, the scope of ECC (on-die ECC) technology on the mold is much narrower.
A Brief History of ECC
Years ago, Intel considered ECC to be exclusive to the professional segment and chose to use it only on Xeon processors. AMD changed that and started adding ECC support to its Ryzen processors. Thus, the costs of ECC technology increased, and finding the appropriate ECC support RAM arose separate problems. But with the DDR5 standard, everything changes. ECC has now become a normal part of DDR5.
The new generation of processors uses ECC (or another type) internally to control the cache and other components for data consistency. However, without ECC-supported RAM, the operating system can’t control internal data between the CPU and RAM or within RAM.
The Importance of ECC
The operating system controls memory consistency to some extent. This process is slow and not exactly reliable. As a result, the operating system cannot detect all problems with the data stored in RAM. In other words, controls such as whether the transactions are made on the right data, and whether the data can be stored in the right file cannot be controlled 100%.
In everyday use, this is not so important. For example, having an invalid character in a Word document does not cause major problems. However, every step in bank transactions is very critical.
When Windows detects data inconsistency, it usually shows a blue screen error. As we have said, the controls of the operating system are not exactly reliable.