-
Running Meta’s latest Llama 3.2 version on Arm CPUs significantly improves performance from the cloud to the edge, providing strong support for future AI workloads
-
Meta’s collaboration with Arm accelerates innovation in use cases such as personalized on-device recommendations and automation of daily tasks
-
Arm has been actively investing in the field of AI for ten years and has carried out extensive open source cooperation to enable LLMs from 1B to 90B to run seamlessly on the Arm computing platform.
The rapid development of artificial intelligence (AI) means that new versions of large language models (LLMs) are constantly being released. To fully realize the potential of AI and seize the opportunities it brings, LLMs need to be widely deployed from the cloud to the edge, which is accompanied by a significant increase in computing and energy requirements. The entire ecosystem is working together to find solutions to this challenge, continuously launching new and more efficient open source LLMs to enable various AI inference workloads at scale and accelerate the delivery of new and fast AI experiences to users.
To this end, Arm and Meta have worked closely to enable the new Llama 3.2 LLM on Arm CPUs, integrating open source innovation with the advantages of the Arm computing platform, significantly advancing the process of solving AI challenges. Thanks to Arm's continued investment and cooperation with the new LLM, the advantages of Arm CPUs running AI stand out in the ecosystem, making Arm the preferred platform for AI inference developers.
Accelerating AI performance from cloud to edge
Small LLMs, such as Llama 3.2 1B and 3B, are critical to enabling large-scale AI reasoning by supporting basic text-based generative AI workloads. Running the new Llama 3.2 3B LLM on mobile devices powered by Arm technology with Arm CPU optimized cores can speed up prompt word processing by five times and token generation by three times, achieving 19.92 tokens per second in the generation phase. This will directly reduce the latency of processing AI workloads on the device, greatly improving the overall user experience. In addition, the more AI workloads that can be processed on the edge, the more power is saved in transmitting data to and from the cloud, thereby saving energy and costs.
In addition to running small models on the edge, Arm CPUs also support running larger models (such as Llama 3.2 11B and 90B) in the cloud. The 11B and 90B models are well suited for cloud-based CPU-based reasoning workloads and can generate text and images, with test results on Arm Neoverse V2 showing even greater performance improvements. Running the 11B image and text model on the Arm-based AWS Graviton4 can achieve 29.3 words per second in the generation phase, far exceeding the human reading speed of about five words per second.
AI will rapidly scale through open source innovation and ecosystem collaboration
Having new LLMs like Llama 3.2 publicly available is critical. Open source innovation is happening at a breakneck pace, and in previous releases the open source community was able to get new LLMs up and running on Arm in less than 24 hours.
Arm will further support the software community through Arm Kleidi, enabling the entire AI technology stack to take full advantage of this optimized CPU performance. Kleidi unlocks the AI capabilities and performance of Arm Cortex and Neoverse CPUs on any AI framework, without requiring additional integration work by application developers.
With the recent Kleidi integration with PyTorch and the ongoing integration with ExecuTorch, Arm is providing developers with seamless AI performance from the cloud to the edge based on Arm CPUs. Thanks to the integration of Kleidi with PyTorch, the token first response time of Llama 3 LLM running on AWS Graviton processors based on Arm architecture is 2.5 times faster.
At the same time, on the client side, compared with the reference implementation, with the support of the KleidiAI library, the first response time of Llama 3 running on the new Arm Cortex-X925 CPU using the llama.cpp library is 190% faster.
Building the future of AI
The collaboration between Arm and Meta has become a new benchmark for industry collaboration, bringing together the flexibility, popularity and AI capabilities of the Arm computing platform, as well as the technical expertise of industry giants such as Meta, to jointly unlock new opportunities for the widespread application of AI. Whether it is using end-side LLM to meet the personalized needs of users, such as performing tasks based on the user's location, schedule and preferences, or optimizing work efficiency through enterprise-level applications so that users can focus more on strategic tasks, the integration of Arm technology has laid the foundation for the future. In the future, devices will no longer be just command and control tools, but will also play an active role in improving the overall user experience.
Running Meta's latest Llama 3.2 version on Arm CPUs has significantly improved AI performance. This type of open collaboration is the best way to achieve ubiquitous AI innovation and promote sustainable development of AI. Through new LLMs, open source communities and Arm's computing platform, Arm is building the future of AI. By 2025, more than 100 billion Arm-based devices will support AI.
Additional Resources
For mobile and edge ecosystem developers, Llama 3.2 runs efficiently on devices based on Arm Cortex CPUs. See our documentation for developer resources.
Developers can access Arm from all major cloud providers and run Llama 3.2 in the cloud on Arm Neoverse CPUs. See our documentation to learn how to get started.
Previous article:Mouser Electronics Now Selling New Arduino Solutions
Next article:Embrace the AI era and win the future of the storage industry! The 3rd GMIF2024 Innovation Summit was successfully held in Shenzhen
Recommended ReadingLatest update time:2024-11-23 02:57
- Popular Resources
- Popular amplifiers
- "Cross-chip" quantum entanglement helps build more powerful quantum computing capabilities
- Why is the vehicle operating system (Vehicle OS) becoming more and more important?
- Car Sensors - A detailed explanation of LiDAR
- Simple differences between automotive (ultrasonic, millimeter wave, laser) radars
- Comprehensive knowledge about automobile circuits
- Introduction of domestic automotive-grade bipolar latch Hall chip CHA44X
- Infineon Technologies and Magneti Marelli to Drive Regional Control Unit Innovation with AURIX™ TC4x MCU Family
- Power of E-band millimeter-wave radar
- Hardware design of power supply system for automobile controller
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- STMicroelectronics discloses its 2027-2028 financial model and path to achieve its 2030 goals
- 2024 China Automotive Charging and Battery Swapping Ecosystem Conference held in Taiyuan
- State-owned enterprises team up to invest in solid-state battery giant
- The evolution of electronic and electrical architecture is accelerating
- The first! National Automotive Chip Quality Inspection Center established
- BYD releases self-developed automotive chip using 4nm process, with a running score of up to 1.15 million
- GEODNET launches GEO-PULSE, a car GPS navigation device
- Should Chinese car companies develop their own high-computing chips?
- Infineon and Siemens combine embedded automotive software platform with microcontrollers to provide the necessary functions for next-generation SDVs
- Continental launches invisible biometric sensor display to monitor passengers' vital signs
- [NXP Rapid IoT Review] + First Look at IOT Kit
- [Atria AT32WB415 Series Bluetooth BLE 5.0 MCU Review] 1.0 Unboxing and evaluation and setting up the operating environment
- Is there any microcontroller burning tool that supports Android?
- Lingyang's 16-bit voice processing microcontroller will be launched soon
- Key points for installing pressure transmitter
- How to Use X-MWBLOCKS
- 【GE32E231_DIY】-01 freertos sharing
- Prize-winning live broadcast: A wireless connection solution for both near and far distances. You are invited to watch at 10:00 am on March 25 (Thursday)!
- 48V to 400V Isolated Bidirectional DC/DC Converter UPS Reference Design
- Application introduction of lock-in amplifier 1