Article count:25239 Read by:103424336

Account Entry

A supercomputing cluster with 131,072 GPUs

Latest update time:2024-09-13
    Reads:

????If you hope to meet more often, please mark the star ?????? and add it to your collection~


Source: Content Compiled from tomshardware , thank you.


Oracle on Wednesday unveiled new clusters for AI training through Oracle Cloud Infrastructure (OCI). The most powerful cluster will be based on Nvidia's upcoming Blackwell GPUs and will deliver up to 2.4 ZettaFLOPS of AI performance, making it more powerful than the AI ​​cluster recently announced by Elon Musk.


Oracle’s new supercomputer clusters can be configured with Nvidia’s Hopper or Blackwell GPUs for AI and HPC, as well as different networking equipment, including ultra-low latency RoCEv2 with ConnectX-7 NICs and ConnectX-8 SuperNICs or Nvidia’s Quantum-2 InfiniBand-based networking, and a choice of HPC storage depending on performance needs:


An OCI supercluster with H100 GPUs can support up to 16,384 GPUs, delivering 65 FP8/INT8 exaFLOPS of peak performance and 13 Pb/s (13 petabits per second) of combined network throughput.


The OCI supercluster powered by H200 GPUs will be available later this year and will scale to 65,536 GPUs, delivering up to 260 FP8/INT8 exaFLOPS and 52 Pb/s of network throughput.


Finally, the Blackwell B200 GPU-based OCI supercluster will scale to 131,072 GPUs and deliver up to 2.4 FP8/INT8 zettaFLOPS of peak performance.


OCI's upcoming supercomputing cluster far exceeds the capabilities of current leading systems. According to Oracle, the top-of-the-line B200-based OCI supercluster will have more than three times the number of GPUs as the Frontier supercomputer (which uses 37,888 AMD Instinct MI250X GPUs) and six times the number of other hyperscale computing systems.


“We have one of the broadest AI infrastructure offerings and support customers running some of the most demanding AI workloads in the cloud,” said Mahesh Thiagarajan, executive vice president of Oracle Cloud Infrastructure. “With Oracle’s distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose, while retaining the highest levels of data and AI sovereignty.”


Several companies are already benefiting from this advanced infrastructure. WideLabs and Zoom are leveraging OCI’s high-performance AI infrastructure to accelerate their AI development while maintaining sovereign control.


“As businesses, researchers and countries race to innovate with AI, access to powerful computing clusters and AI software is critical,” said Ian Buck, vice president of hyperscale and high-performance computing at Nvidia. “Nvidia’s full-stack AI computing platform on Oracle’s widely distributed cloud will deliver AI computing power at an unprecedented scale to advance the world’s AI efforts and help organizations around the world accelerate research, development and deployment.”


The upcoming OCI supercluster will use Nvidia's GB200 NVL72 liquid-cooled cabinet with 72 GPUs communicating with each other in a single NVLink domain with a total bandwidth of 129.6 TB/s. Oracle says Nvidia's Blackwell GPUs will be available in the first half of 2025 (as Blackwell supply is limited this year), but it's unclear when OCI will offer fully loaded Blackwell-powered clusters.


The first Zettascale cloud computing cluster


Oracle today announced the availability of the first Zeta-class cloud computing clusters accelerated by the NVIDIA Blackwell platform. Oracle Cloud Infrastructure (OCI) is now accepting orders for the largest AI supercomputer in the cloud, which can be equipped with up to 131,072 NVIDIA Blackwell GPUs.


“We have one of the broadest AI infrastructure offerings and support customers running some of the most demanding AI workloads in the cloud,” said Mahesh Thiagarajan, executive vice president of Oracle Cloud Infrastructure. “With Oracle’s distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose, while retaining the highest levels of data and AI sovereignty.”


OCI is now accepting orders for the largest AI supercomputer in the cloud, featuring up to 131,072 NVIDIA Blackwell GPUs and an unprecedented 2.4 zettaFLOPS of peak performance. The largest scale of the OCI Supercluster offers more than three times the number of GPUs as the Frontier supercomputer and more than six times the number of other hyperscalers. The OCI Supercluster includes OCI Compute Bare Metal, ultra-low latency RoCEv2 or NVIDIA Quantum-2 InfiniBand-based networking with ConnectX-7 NICs and ConnectX-8 SuperNICs, and a choice of HPC storage.


OCI Super Clusters can be ordered with OCI Compute powered by NVIDIA H100 or H200 Tensor Core GPUs or NVIDIA Blackwell GPUs. OCI Super Clusters with H100 GPUs will scale to 16,384 GPUs with up to 65 ExaFLOPS of performance and 13Pb/s of aggregate network throughput. OCI Super Clusters with H200 GPUs will scale to 65,536 GPUs with up to 260 ExaFLOPS of performance and 52Pb/s of aggregate network throughput and will be available later this year. OCI Super Clusters with NVIDIA GB200 NVL72 liquid-cooled bare metal instances will use NVLink and NVLink Switch to enable up to 72 Blackwell GPUs to communicate with each other in a single NVLink domain with an aggregate bandwidth of 129.6 TB/s. NVIDIA Blackwell GPUs will be available in the first half of 2025 with fifth-generation NVLink, NVLink Switch, and cluster networking for seamless GPU-GPU communication in a single cluster.


“As businesses, researchers and countries race to innovate with AI, access to powerful computing clusters and AI software is critical,” said Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA. “NVIDIA’s full-stack AI computing platform on Oracle’s widely distributed cloud will deliver AI computing power at an unprecedented scale to advance the world’s AI efforts and help organizations around the world accelerate research, development and deployment.”


Customers such as WideLabs and Zoom are leveraging OCI’s high-performance AI infrastructure with strong security and sovereign control.


Reference Links

https://www.tomshardware.com/tech-industry/artificial-intelligence/nvidia-and-oracle-team-up-for-zettascale-cluster-available-with-up-to-131072-blackwell-gpus


END


????Semiconductor boutique public account recommendation????

▲Click on the business card above to follow

Focus on more original content in the semiconductor field


▲Click on the business card above to follow

Focus on the trends and developments of the global semiconductor industry

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.



Today is the 3884th content shared by "Semiconductor Industry Observer" for you, welcome to follow.


Recommended Reading


"The first vertical media in semiconductor industry"

Real-time professional original depth

Public account ID: icbank


If you like our content, please click "Reading" to share it with your friends.

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号