Nvidia releases its largest chip yet

Latest update time：2024-11-20

Reads：

????If you hope to meet more often, please mark the star ?????? and add it to your collection~

Source: Content compiled from crn, thank you.

At the 2024 Supercomputing Conference, the AI computing giant unveiled what may be the largest AI "chip" to date, the quad-GPU Grace Blackwell GB200 NVL4 Superchip, while also announcing the general availability of its H200 NVL PCIe module for use in enterprise servers running AI workloads.

This is yet another sign that the company is expanding the traditional definition of a semiconductor chip to drive its AI computing ambitions.

The product, announced Monday at the Supercomputing 2024 event, goes a step further than Nvidia’s recently launched Grace Blackwell GB200 Superchip, which was announced in March as the company’s new flagship AI computing product. The AI computing giant also announced the general availability of its H200 NVL PCIe modules, which will make the H200 GPUs launched earlier this year more suitable for standard server platforms.

The GB200 NVL4 Superchip is designed for a "single-server Blackwell solution" running high-performance computing and AI workloads, Dion Harris, Nvidia's director of accelerated computing, said in a briefing with reporters last week.

Those server solutions include Hewlett Packard Enterprise's Cray Supercomputing EX154n accelerator blade, which was announced last week and can accommodate up to 224 B200 GPUs. The Cray blade servers are expected to be available by the end of 2025, according to HPE.

According to the images shared by Nvidia, the GB200 Superchip looks like a sleek black motherboard that connects an Arm-based Grace GPU to two B200 GPUs based on Nvidia's new Blackwell architecture. The NVL4 product appears to double the surface area of the Superchip, with two Grace CPUs and four B200 GPUs installed on a larger motherboard.

The GB200 Grace Blackwell NVL4 Superchip is a more powerful variant of the standard (non-NVL4) dual-GPU variant, featuring up to four B200 Blackwell GPUs interconnected via NVLink and two Grace ARM-based CPUs on a single motherboard. The solution is designed to handle mixed HPC and AI workloads, with up to 1.3TB of coherent memory. Nvidia advertises the GB200 NVL4 as having 2.2 times the simulation performance, 1.8 times the training performance, and 1.8 times the inference performance of its direct predecessor, the Nvidia GH200 NVL4 Grace Hopper Superchip.

Like the standard GB200 Superchip, the GB200 NVL4 uses Nvidia's fifth-generation NVLink inter-chip interconnect for high-speed communication between the CPU and GPU. The company previously said that this generation of NVLink can achieve a bidirectional throughput of 1.8 TB/s per GPU.

Nvidia says the GB200 NVL4 Superchip has 1.3 TB of coherent memory that can be shared between four B200 GPUs via NVLink.

To demonstrate the computing power of the GB200 NVL4, the company compared it to the previously released GH200 NVL4 Superchip, which was originally launched a year ago as the Quad GH200 and consists of four Grace Hopper GH200 Superchips. The GH200 Superchip contains a Grace CPU and a Hopper H200 GPU.

Compared to the GH200 NVL4, the GB200 NVL4 is 2.2 times faster for simulation workloads using MILC code, 80% faster for training the 37 million parameter GraphCast weather forecasting AI model, and 80% faster for inference on the 7 billion parameter Llama 2 model using 16-bit floating point precision.

The company did not provide any further specifications or performance claims.

Nvidia's partners are expected to unveil new Blackwell-based solutions at the 2024 Supercomputing Conference this week, Harris said in a briefing with reporters.

“The Blackwell rollout has gone smoothly thanks to the reference architecture, which enables partners to quickly bring products to market while adding their own custom features,” he said.

Nvidia releases H200 NVL PCIe module

In addition to launching the GB200 NVL4 Superchip, Nvidia also announced that its previously announced H200 NVL PCIe card will be available in partner systems next month.

The NVL4 module contains Nvidia’s H200 GPU, which was launched earlier this year in the SXM form factor for Nvidia’s DGX systems as well as the server vendor’s HGX systems. The H200 is the successor to the company’s H100, using the same Hopper architecture and helping Nvidia become a major provider of AI chips for generating AI workloads.

Unlike standard PCIe designs, the H200 NVL consists of two or four PCIe cards connected together using Nvidia's NVLink interconnect bridge, giving each GPU a bidirectional throughput of 900 GB/s. Its predecessor, the H100 NVL, only connected two cards via NVLink.

It is also air-cooled, compared to the H200 SXM which comes with a liquid cooling option.

Harris said the dual-slot PCIe form factor makes the H200 NVL "ideal for data centers with low-power, air-cooled enterprise rack designs, with flexible configurations to deliver acceleration for every AI and HPC workload, no matter the size."

“Companies can use existing racks and choose the number of GPUs that best suits their needs, selecting from one, two, four or even eight GPUs, with NVLink domains scalable up to four,” he said. “Enterprises can use the H200 NVL to accelerate AI and HPC applications while improving energy efficiency by reducing power consumption.”

Like its SXM cousin, the H200 NVL comes with 141GB of high-bandwidth memory and 4.8 TB/s of memory bandwidth, compared to 94GB and 3.9 TB/s for the H100 NVL, but it has a maximum thermal design power of just 600 watts, rather than the 700-watt maximum of the H200 SXM version, according to the company.

This results in the H200 NVL having slightly lower performance than the SXM module. For example, the H200 NVL can only reach 30 teraflops for 64-bit floating point (FP64) and 3,341 teraflops for 8-bit integer (INT8), while the SXM version can reach 34 teraflops for FP64 and 3,958 teraflops for INT8. (A teraflop is a unit of measurement for one trillion floating point operations per second.)

Nvidia says the H200 NVL is 70 percent faster than the H100 NVL at inference on the 70-billion-parameter Llama 3 model. As for HPC workloads, the company says the H200 NVL is 30 percent faster at reverse time migration modeling.

The H200 NVL comes with a five-year subscription to the Nvidia AI Enterprise software platform, which comes with Nvidia NIM microservices to accelerate AI development.

Reference Links

https://www.crn.com/news/components-peripherals/2024/nvidia-reveals-4-gpu-gb200-nvl4-superchip-releases-h200-nvl-module

END

????Semiconductor boutique public account recommendation????

▲Click on the business card above to follow

Focus on more original content in the semiconductor field

▲Click on the business card above to follow

Focus on the trends and developments of the global semiconductor industry

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 3952th issue of Semiconductor Industry Observer . Welcome to follow.