The “Grace” CPU uses energy-efficient Arm cores to deliver a 10x performance boost for systems training giant AI models
Swiss Supercomputer Center and U.S. Department of Energy’s Los Alamos National Laboratory Build First Supercomputer Powered by NVIDIA CPUs
SANTA CLARA, Calif. — GTC — April 12, 2021 — NVIDIA unveiled its first Arm-based data center CPU processor, delivering 10 times the performance of today’s fastest servers for the most complex AI and high-performance computing workloads.
The result of more than 10,000 engineering years, the NVIDIA Grace™ CPU is designed to meet the computing requirements of the world's most advanced applications - including natural language processing, recommendation systems, and AI supercomputing - that require ultra-fast computing performance and large memory for massive data analysis. It combines energy-efficient Arm CPU cores with an innovative low-power memory subsystem to deliver high performance with high energy efficiency.
“Cutting-edge AI and data science are pushing today’s computer architectures beyond their limits to process data at unimaginable scale,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA designed Grace, a CPU designed for large-scale AI and HPC, using IP licensed from Arm. Along with GPUs and DPUs, Grace gives us a third foundational technology for computing and the ability to reimagine the data center to advance AI. NVIDIA is now a three-chip company.”
Grace is a highly specialized processor for workloads such as training next-generation NLP models with more than 1 trillion parameters. When tightly coupled with NVIDIA GPUs, systems powered by Grace CPUs are 10 times faster than today’s most advanced NVIDIA DGX™-based systems running on x86 CPUs.
While the vast majority of data centers are served by existing CPUs, Grace (named after American computer programming pioneer Grace Hopper) will serve a niche market for computing.
The Swiss National Supercomputing Center (CSCS) and the U.S. Department of Energy's Los Alamos National Laboratory were the first to announce plans to build supercomputers powered by Grace to support national scientific research efforts.
NVIDIA launched Grace against the backdrop of exponential growth in data volumes and AI model sizes. Today’s largest AI models contain billions of parameters, and the number of parameters doubles every two and a half months. Training these models requires a new CPU that is tightly coupled to the GPU to eliminate system bottlenecks.
NVIDIA built Grace to take advantage of the tremendous flexibility of the Arm data center architecture. With the launch of new server-class CPUs, NVIDIA is advancing its goal of technological diversity in the fields of AI and HPC, where more choice is key to achieving the innovation needed to solve the world's most pressing problems.
“As the world’s most widely licensed processor architecture, Arm is driving innovation in incredible new ways every day,” said Simon Segars, CEO of Arm. “The launch of NVIDIA’s Grace data center CPU is a clear example of how Arm’s licensing model is enabling an important innovation that will further support the incredible work of AI researchers and scientists around the world.”
Grace's first adopters push the limits of science and AI
CSCS and Los Alamos National Laboratory plan to launch a supercomputer powered by Grace, built by HPE, in 2023.
“The new NVIDIA Grace CPU allows us to merge AI techniques with traditional supercomputing to tackle some of the most difficult problems in computational science,” said Professor Thomas Schulthess, director of CSCS. “We are excited to make this new NVIDIA CPU available to our customers in Switzerland and around the world for processing and analyzing large and complex scientific datasets.”
“This next-generation system will reshape our institution’s computing strategy through an innovative balance of memory bandwidth and capacity,” said Thom Mason, director of Los Alamos National Laboratory. “With NVIDIA’s new Grace CPU, we can perform advanced science with high-fidelity 3D simulations and analysis on larger data sets than ever before.”
Achieving breakthrough performance
Grace’s powerful performance is based on fourth-generation NVIDIA NVLink® interconnect technology, which provides a record-breaking 900 GB/s connection speed between Grace and NVIDIA GPUs, enabling 30 times more total bandwidth than today’s leading servers.
Grace will also leverage the innovative LPDDR5x memory subsystem, which has twice the bandwidth and 10 times the power efficiency of DDR4 memory. In addition, the new architecture provides cache coherence of a single memory address space, combining system and HBM GPU memory to simplify programmability.
Grace will be supported by the NVIDIA HPC software development kit and a full suite of CUDA® and CUDA-X™ libraries to accelerate more than 2,000 GPU applications, enabling faster discovery for scientists and researchers tackling the world’s most critical challenges.
Previous article:Wind River and Intel Select Solutions Program
Next article:Intel and China Mobile Research Institute jointly released the first cloud-network fusion SRv6 open source project
- Popular Resources
- Popular amplifiers
- Wi-Fi 8 specification is on the way: 2.4/5/6GHz triple-band operation
- Three steps to govern hybrid multicloud environments
- Microchip Accelerates Real-Time Edge AI Deployment with NVIDIA Holoscan Platform
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Qualcomm launches its first RISC-V architecture programmable connectivity module QCC74xM, supporting Wi-Fi 6 and other protocols
- Microchip Launches Broadest Portfolio of IGBT 7 Power Devices Designed for Sustainable Development, E-Mobility and Data Center Applications
- Infineon Technologies Launches New High-Performance Microcontroller AURIX™ TC4Dx
- Rambus Announces Industry’s First HBM4 Controller IP to Accelerate Next-Generation AI Workloads
- NXP FRDM platform promotes wireless connectivity
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Learn ARM development(14)
- Learn ARM development(15)
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- From probes to power supplies, Tektronix is leading the way in comprehensive innovation in power electronics testing
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Why GaN is the focus of future industries
- I would like to ask how these magical formulas are derived.
- STM32F767 Bluetooth module communication problem
- C28x series DSP (28069) (28377D) CAN communication
- List the functions of TI msp430 microcontroller clock
- Pure dry goods! Overview of radio frequency power amplifier (RF PA)
- Hardware Circuit Design - "DFX"
- ESP32 Learning Notes 3 -- WS2812 16*16 dot matrix point, line and surface painting
- Discussion on Optimization Ideas for 5G Massive MIMO
- City Knows How to Have Fun: This guy has ruined Avatar!