NVIDIA unveiled details of the Hopper architecture and next-generation H100 GPU at the GPU Technology Conference (GTC). We know that the company has been working on the next generation of GPUs for some time, but now we have some concrete features. Hopper architecture and H100 GPU are completely different with the consumer-focused Ada Lovelace architecture that will power future GeForce cards.

The green team has not yet released any details about the ‘Island’ architecture. The Hopper H100 will replace the Ampere A100, which replaces the Volta V100, all of which are used in data centers. NVIDIA will compete with amd’s accelerators such as instinct MI250/250X and the newly announced Instinct MI210 and wants to consolidate its leadership at HPC.

The H100 is designed for supercomputers that focus on AI capabilities, bringing significant upgrades and updates compared to the current A100. The chip brings together 80 billion transistors and was built using a special TSMC 4N production technology. This should not be compared to the overall N4 4nm process offered by TSMC and is a special manufacturing technology for NVIDIA. As a note, the past generation A100 had 54 billion transistors, and the increase is really significant.

NVIDIA did not disclose the core numbers or times, but gave further details. The H100 supports the company’s fourth-generation NVLink interface, which can offer up to 128GB/s of bandwidth. PcIe 5.0 support is also available for systems that don’t use NVLink, which means speeds of 128GB/s. The updated NVLink connection provides 1.5x more bandwidth than the A100, while PCIe 5.0 offers twice the bandwidth of PCIe 4.0. Generally speaking, the H100 has 50% more memory and interface bandwidth than the previous model.

Of course, memories are important, but Hooper has other important pros. The new NVIDIA H100 can provide up to 2,000 TFLOPS FP16 operations, up to 1,000 TFLOPS TF32 operations, plus 60 TFLOPS general purpose FP64 processing power. So in all three cases there is a threefold performance increase compared to the A100. Hopper, on the other hand, brings advanced FP8 support with calculations up to 4,000 TFLOPS, which is six times faster than the A100 (which has to rely on FP16 because it doesn’t have native FP8 support). To optimize performance, NVIDIA has a new transformer engine that will automatically switch between FP8 and FP16 formats depending on the workload.

The green team will also add new DPX instructions designed to speed up dynamic programming. These can help with a wide range of algorithms, including route optimization and genomics, and NVIDIA claims that performance in these algorithms is up to 7 times faster than previous generation GPUs and up to 40 times faster than CPU-based algorithms.

All these changes are important for the supercomputer and AI industry. However, these advantages bring some disadvantages. Despite a smaller manufacturing technology, the H100 TDP for the SXM variant has been upgraded to 700W compared to 400W for A100 SXM modules. This means 75% more power for improvements ranging from 50% to 500%, depending on the workload. Overall, performance is expected to be two to three times faster than the NVIDIA A100, but power consumption appears to have increased considerably.

Overall, the chipmaker claims that the H100 scales better than the A100 and can deliver up to 9x more efficiency in AI training. It also offers 16X to 30X more performance using Megatron 530B output as a benchmark. Finally, hpc applications such as 3D FFT (fast Fourier transform) and genome sequencing emphasize that the H100 is up to 7 times faster than the A100.

As with the A100, Hopper-based GPUs will initially be offered as a new DGX H100 rack-mounted server. Each DGX H100 system includes eight H100 GPUs with 32 PFLOPS AI calculations and 0.5 PFLOPS FP64 with 640GB of HBM3 memory.

Grace


The company will also offer the “Grace Hopper Superchips” solution, which combines the Grace CPU and Hopper GPU on a single module, with a consistent interface of 900 GB/s between the two. While Hopper will arrive in the third quarter of this year, Grace Hopper Superchip won’t be on the market until the third quarter of 2023. It is unclear whether they will be used on future DGX servers.

RTX 40 and Ada Lovelace

The previous generation A100 used the TSMC 7N (exclusive N7 for NVIDIA), unlike the Samsung 8N technology used for the rest of NVIDIA’s Ampere series. Rumor has it that the Island GPUs, which will reach end consumers, will use a slightly less advanced production technology than the Hopper. Expectations are focused on TSMC N5 production, which is not so different from 4N.

The massive performance improvements with Hopper architecture are actually a harbinger of what can be offered with GeForce graphics cards. The tech giant probably won’t use HBM3 for its Island GPUs, but it is said that there will potentially be a two to three times difference between the performance of the H100 and A100. With new manufacturing technologies and architectural enhancements, the GeForce RTX 4090 is likely to be twice as fast as the RTX 3090. Several times before, rumors have spread that the next generation GeForce RTX 40 series cards can come with up to 600W of TGP. So, as with Hopper, Ada Lovelace cards can come with high performance as well as high power consumption.