AT an earlier event, AMD cited new EPYC Milan-X processors with 3D V-Cache technology as the basis. All the details and performance scores of these huge chips, which are now going on sale, have been shared. Designed for servers and data centers, the chips use existing Zen 3 cores. The biggest improvement was on the cache side.

Thanks to the new technology, a total of 804 MB of L3 caches can be used per chip. This means that soon there will be dual-socket servers with 1.5 GB of L3 cache on the system. AMD also presented an example of some server workloads and some benchmark results showing high performance improvements.

Reds first introduced 3D V-Cache technology at CES 2021.3D V-Cache was developed to triple the amount of L3 cache per chip, and a 7nm SRAM cache layer is stacked vertically on computing chips.

EPYC Milan-X Basic Details

AMD placed a single layer of 6x6mm L3 cache directly on top of the L3 cache already available on each CCD (computing chip). Each CCD was carrying a 32 MB L3 cache before the change. Adding a 3D-stacked L3 cache layer adds another 64 MB cache, bringing the total to 96 MB per CCD. EPYC Milan-X chips will be extended to 64-core models with eight CCDs, bringing a total of 768 MB of L3 cache per chip.

Vertically stacked L3 cache increases overall latency by roughly 10%. AMD, on the other hand, continues to use the same Zen 3 cores as normal. The control circuit for the 3D V-Cache was added as a forward design choice in the early design stages. So existing EPYC Milan chips are being adopted as building blocks, so the chips use SP3 slots on EPYC servers.

AMD underlined that the solderless hybrid bonding technique, which enables 3D V-Cache technology, has many benefits such as 200x interconnection density increase compared to 2D chips, 15x density increase compared to micro-bump 3D packaging and 3x energy efficiency gain. The company also says hybrid connectivity improves thermal values, transistor density and interconnection pitch compared to other 3D approaches, making it the most flexible active-on-active silicon stacking technology.

The chipmaker also says it maintains compatibility with SP3 systems based on the Milan family. All that is needed for Milan-X support on existing Milan servers is a BIOS update.

EPYC Milan-X 7003X Processor Features

AMD is launching four Milan-X models with its new technology. In this context, the word “X” at the end of the model number indicates that the part has AMD 3D V-Cache capability.

Each Milan-X model has 8 CCDs (Core Complex Dies) with a 96 MB L3 cache, each with a total of 768 MB. The Zen 3-share L3 cache model showed that any kernel could access the entire 96 MB cache.

The company says it maintains performance leadership with its new chips, and prices are up 20% compared to the Milan family.

The flagship AMD EPYC 7773X will have 64 cores, 128 threads and a maximum of 280W of TDP. The cache amount is listed with a 2.2 GHz base and 3.5 GHz boost frequency, while up to 768 MB with 3D V-Cache. This chip basically contained 256 MB of L3 cache, while the L3 SRAM offers an additional cache amount of 512 MB. This means that each Zen 3 CCD will have a 64 MB L3 cache.

The second model has 32 cores and 64 threads with 280W TDP and is called epyc 7573X. The basic clock is kept at 2.8 GHz, while the boost clock is up to 3.6 GHz. As you can see the details from the table, the bottom is the 24-core AMD EPYC 7473X and the 16-core AMD EPYC 7373X.

All processors also come with 8 channels of memory support, 768 MB of L3 cache and 128 PCIe 4.0 strips.

CPUCoreBase ClockBoostLLC (3D SRAM)L3L2TDPPrice
AMD EPYC 7773X64 / 1282.2 GHz3.50 GHzCCD başına 64 MB512 + 256 MB32 MB225/
280W
$8800
AMD EPYC 776364 / 1282.45 GHz3.50 GHz256 MB32 MB225/
280W
$7890
AMD EPYC 7573X32 / 642.80 GHz3.60 GHzCCD başına 64 MB512 + 256 MB32 MB225/
280W
$5590
AMD EPYC 75F332 / 642.95GHz4.00 GHz256 MB32 MB225/
280W
$3761
AMD EPYC 7473X24 / 482.80 GHz3.70 GHzCCD başına 64 MB512 + 256 MB12 MB225/
280W
$3900
AMD EPYC 74F324 / 483.20 GHz4.00 GHz128 MB12 MB225/
240W
$2010
AMD EPYC 7373X16 / 323.05 GHz3.80 GHzCCD başına 64 MB512 + 256 MB8 MB225/
280W
$4185
AMD EPYC 73F316 / 323.50 GHz4.00 GHz128 MB8 MB225/
240W
$1565

EPYC Milan-X Processor Configuration

In the image below you can see the structure of the next generation of processors. The diagram on the left is a central pattern and with the 8 CCDs surrounding it, the 3. The generation represents a logical view of EPYC SoC.

At the top right, you can look at the extended view of a Milan CCD with 8 Zen 3 cores, each with its own L2 cache, 32 MB of shared L3 cache modules. Below this is a Milan-X CCD with the same Zen3 cores, but now triples the shared L3 cache.

With a shared cache, any quorum can use as much as the cache needs. Each of the 8 cores can use 12 MB, or a single core can access the entire 96 MB.

Performance

AMD also shared some performance figures. The applications you see here are usually deployed to clusters, but first we see CPU performance comparison on a single node. The next slide has comparisons on a kernel basis.

Here, the company used Intel’s 40-core processor, the Xeon 8380, and the 64-core Milan-X 7773X. Let’s also note that the data is based on performance results from Intel.

Milan-X achieved an average advantage of 44%, with a peak of 93% on the Neon model. Ansys Fluent is a computational fluidity dynamic application and, in fact, the largest CFD application on the market. It is used to design everything from airplanes, trains and cars to boats and consumer products.

The Fluent benchmark package includes 15 different test scenarios that Ansys believes best represent the range of designs customers can model. The Milan-X processor, together with tests based on 15 workloads, shows an average difference of 47%, with a peak of 117%.

Ansys LS-DYNA is another structural analysis application that models dynamic collision events. The LS-DYNA test team includes four different models. Milan-X runs on average 69% faster than Intel’s processor.

Finally, Ansys CFX is considered another application of computational fluid dynamics for special use cases. The CFX benchmark package includes five separate tests. The red team emphasizes that it shows an average difference of 96%, with a peak of 125%.

We see the same four applications in this slide, but now the tests are on a core basis. AMD has intel’s high-performance, 32-core CPU 8362. The Xeon chip was again compared to the 32-core 7573X. The red team says it offers double-digit performance improvements in core-to-core benchmarks.

Finally, some results were shared by following the example of an organization that does business with CFX. This company uses CFX to design, model and test its products. AMD says a system of 20 Intel servers using Intel’s fastest Xeon processor can handle up to 4,600 jobs a day.

The 32-core Milan-X chip can do the same with half of these servers and even using 49% less power. According to the red team’s calculations, the power used per year decreases from 178 kwh to 91 kwh. This saves the organization 51%, and in addition, there are licensing costs.