Intel reveals Xeon 6 processor inference performance for Meta Llama 3 model

Publisher:EE小广播Latest update time:2024-04-20 Source: EEWORLDKeywords:Intel Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Recently, Meta launched its Meta Llama 3 open source large models with 8 billion and 70 billion parameters. The model introduces new features such as improved reasoning and more model sizes, and uses a new tokenizer to improve coding language efficiency and improve model performance.


As soon as the model was released, Intel verified that Llama 3 can run on a rich AI product portfolio including Intel® Xeon® processors, and disclosed the inference performance of the upcoming Intel Xeon 6-Core processor (codenamed Granite Rapids) for the Meta Llama 3 model.


Intel Xeon processors can meet the needs of demanding end-to-end AI workloads. Taking the fifth-generation Xeon processor as an example, each core has a built-in AMX acceleration engine, which can provide excellent AI reasoning and training performance. So far, this processor has been adopted by many mainstream cloud service providers. Not only that, Xeon processors can provide lower latency when performing general computing and can handle multiple workloads simultaneously.


In fact, Intel has been continuously optimizing the inference performance of large models on the Xeon platform. For example, the latency of PyTorch and Intel® Extension for PyTorch is reduced by 5 times compared to the software of the Llama 2 model. This optimization is achieved through the Paged Attention algorithm and tensor parallelism, because it can maximize the available computing power and memory bandwidth. The figure below shows the inference performance of the 8 billion parameter Meta Lama 3 model on the AWS m7i.metal-48x instance, which is based on the fourth-generation Intel Xeon Scalable processor.


image.png

Figure 1: Next Token Latency for Llama 3 on an AWS Instance


In addition, Intel also disclosed for the first time the performance test of its upcoming product, Intel® Xeon® 6-core processor (code-named Granite Rapids), on Meta Llama 3. The results show that compared with the fourth-generation Xeon processor, the Intel Xeon 6 processor has reduced the latency of the 8-billion-parameter Llama 3 inference model by 2 times, and can run inference models with larger parameters such as the 70-billion-parameter Llama 3 on a single dual-core server with a token latency of less than 100 milliseconds.


image.png

Figure 2: Next Token Delay of Llama 3 Based on Intel® Xeon® 6-Core Processor (Codename Granite Rapids)


Considering that Llama 3 has a more efficient encoding language tokenizer, the test used randomly selected prompts to quickly compare Llama 3 and Llama 2. With the same prompt, Llama 3 tokenizes 18% fewer tokens than Llama 2. Therefore, even though the 8 billion parameter Llama 3 model has more parameters than the 7 billion parameter Llama 2 model, the overall prompt inference latency is almost the same when running BF16 inference on an AWS m7i.metal-48xl instance (Llama 3 is 1.04 times faster than Llama 2 in this evaluation).


Developers can find instructions for running Llama 3 on Intel Xeon platforms here.


Product and Performance Information

Intel Xeon Processors:


Tested on Intel® Xeon® 6 processor (formerly codenamed Granite Rapids), using 2x Intel® Xeon® Platinum, 120 cores, HT on, Turbo on, NUMA 6, integrated accelerators available [used]: DLB[8], DSA[8], IAA[8], QAT[8], total memory 1536GB (24x64GB DDR5 8800 MT/s[8800 MT/s]), BIOS BHSDCRB1.IPC.0031.D44.2403292312, microcode 0x810001d0, 1x Ethernet controller I210 Gigabit network connection 1x SSK storage 953.9G, Red Hat Enterprise Linux 9.2 (Plow), 6.2.0-gn r.bkc.6.2.4.15.28.x86_64, based on testing by Intel on April 17, 2024.


Tested on 4th Generation Intel® Xeon® Scalable processors (formerly codenamed Sapphire Rapids), using AWS m7i.metal-48xl instance, 2x Intel® Xeon® Platinum 8488C, 48 cores, HT on, Turbo on, NUMA 2, integrated accelerators available [used]: DLB[8], DSA[8], IAA[8], QAT[8], total memory 768GB (16x32GB DDR5 4800 MT/s[4400 MT/s]); (16x16GB DDR5 4800 MT/s[4400 MT/s]), BIOS Amazon EC2, microcode 0x2b0000590, 1x Ethernet Controller Elastic Network Adapter (ENA) Amazon Elastic Block Store (EBS) 256G, Ubuntu 22.04.4 LTS, 6.5.0-1016-ws, based on Intel testing as of April 17, 2024.


Keywords:Intel Reference address:Intel reveals Xeon 6 processor inference performance for Meta Llama 3 model

Previous article:Making AI ubiquitous, Intel helps the Olympics unleash the charm of technology with AI platform innovation
Next article:Quantum internet key connection achieved for the first time

Recommended ReadingLatest update time:2024-11-16 09:30

Intel and the industry create a full-stack AI solution to accelerate the development of commercial AI PCs
March 26, 2024, Beijing - Today, Intel held the "2024 New Intel Commercial Client AI PC Product Launch Conference", extending the AI ​​features based on Intel® Core™ Ultra processors to the commercial field, bringing innovations in commercial computer technology. At the scene, Intel shared the products and i
[Network Communication]
Intel and the industry create a full-stack AI solution to accelerate the development of commercial AI PCs
Decoding Apple's self-developed chip strategy: Risking abandonment of Intel, Cook's confidant takes the top job
Apple was in trouble: While the iPhone was selling like hot cakes, Mac sales were stagnant. The Mac’s design and performance didn’t impress customers. Five years later, however, Mac sales are soaring, a turnaround fueled by Apple’s extraordinary, multiyear effort to build one of the world’s most advanced chip-design
[Mobile phone portable]
Decoding Apple's self-developed chip strategy: Risking abandonment of Intel, Cook's confidant takes the top job
Intel's dream of autonomous driving is gradually becoming brighter
With the implementation of 5G, a large number of new technologies will emerge; Intel is also seizing this opportunity to make some money; autonomous driving is one of its areas, and as of now, Intel's positioning in this area is pretty good; the author believes that the heavy investment in acquiring Mobileye can pay o
[Embedded]
Intel's dream of autonomous driving is gradually becoming brighter
Intel builds a partner alliance to accelerate innovation with the industry
Intel today disclosed more details about the latest Intel® Partner Alliance program and launched the Intel® Partner University training platform to help partners connect, innovate, and grow. The Intel Partner Alliance will be launched in the second half of 2020. It will unify Intel's multiple partner programs and infr
[Internet of Things]
Intel acquires Smart Edge, aiming for leadership in 5G edge computing
                                                Intel has signed a definitive agreement with IT infrastructure and services provider Pivot Technology Solutions to acquire the latter's Smart Edge™ intelligent edge platform business. Smart Edge is a cloud-native, scalable, secure platform for multi-access edge computin
[Network Communication]
Intel acquires Smart Edge, aiming for leadership in 5G edge computing
​ Intel Sharp Graphics becomes the official graphics processor for the Hangzhou Asian Games, cheering for every e-sports dream
On August 15, 2023, the official graphics processor supplier launch ceremony for the 19th Asian Games in Hangzhou was held in Hangzhou. Intel officially became the official graphics processor supplier of the Hangzhou Asian Games . Ye Hong, deputy director of the Market Development Department of the Hangzhou Asian Game
[Home Electronics]
​ Intel Sharp Graphics becomes the official graphics processor for the Hangzhou Asian Games, cheering for every e-sports dream
Global semiconductor revenue drops 11% in 2023: Intel regains first place from Samsung, Nvidia ranks among the top five for the first time
According to news on January 17, according to preliminary statistics released by market research agency Gartner, global semiconductor revenue will total US$533 billion in 2023, a year-on-year decrease of 11.1%. Alan Priestley, vice president analyst at Gartner, said: The semiconductor industry will not only be affect
[Semiconductor design/manufacturing]
Intel's Corporate Social Responsibility Anniversary Review: Staying True to Our Commitments and Looking to the Future
Intel's Chief Human Resources Officer: Unleashing Power and Actively Practicing Corporate Social Responsibility Recently, Intel released the "2023-2024 Intel Corporate Social Responsibility Report", which focuses on reviewing the important progress Intel has made in many areas such as business and
[Network Communication]
Intel's Corporate Social Responsibility Anniversary Review: Staying True to Our Commitments and Looking to the Future
Latest Network Communication Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号