Llama 3.2 LLM fully accelerates and expands AI reasoning with the support of Arm computing platform-EEWORLD

Collect

Running Meta’s latest Llama 3.2 version on Arm CPUs significantly improves performance from the cloud to the edge, providing strong support for future AI workloads
Meta’s collaboration with Arm accelerates innovation in use cases such as personalized on-device recommendations and automation of daily tasks
Arm has been actively investing in the field of AI for ten years and has carried out extensive open source cooperation to enable LLMs from 1B to 90B to run seamlessly on the Arm computing platform.

The rapid development of artificial intelligence (AI) means that new versions of large language models (LLMs) are constantly being released. To fully realize the potential of AI and seize the opportunities it brings, LLMs need to be widely deployed from the cloud to the edge, which is accompanied by a significant increase in computing and energy requirements. The entire ecosystem is working together to find solutions to this challenge, continuously launching new and more efficient open source LLMs to enable various AI inference workloads at scale and accelerate the delivery of new and fast AI experiences to users.

To this end, Arm and Meta have worked closely to enable the new Llama 3.2 LLM on Arm CPUs, integrating open source innovation with the advantages of the Arm computing platform, significantly advancing the process of solving AI challenges. Thanks to Arm's continued investment and cooperation with the new LLM, the advantages of Arm CPUs running AI stand out in the ecosystem, making Arm the preferred platform for AI inference developers.

Accelerating AI performance from cloud to edge

Small LLMs, such as Llama 3.2 1B and 3B, are critical to enabling large-scale AI reasoning by supporting basic text-based generative AI workloads. Running the new Llama 3.2 3B LLM on mobile devices powered by Arm technology with Arm CPU optimized cores can speed up prompt word processing by five times and token generation by three times, achieving 19.92 tokens per second in the generation phase. This will directly reduce the latency of processing AI workloads on the device, greatly improving the overall user experience. In addition, the more AI workloads that can be processed on the edge, the more power is saved in transmitting data to and from the cloud, thereby saving energy and costs.

In addition to running small models on the edge, Arm CPUs also support running larger models (such as Llama 3.2 11B and 90B) in the cloud. The 11B and 90B models are well suited for cloud-based CPU-based reasoning workloads and can generate text and images, with test results on Arm Neoverse V2 showing even greater performance improvements. Running the 11B image and text model on the Arm-based AWS Graviton4 can achieve 29.3 words per second in the generation phase, far exceeding the human reading speed of about five words per second.

AI will rapidly scale through open source innovation and ecosystem collaboration

Having new LLMs like Llama 3.2 publicly available is critical. Open source innovation is happening at a breakneck pace, and in previous releases the open source community was able to get new LLMs up and running on Arm in less than 24 hours.

Arm will further support the software community through Arm Kleidi, enabling the entire AI technology stack to take full advantage of this optimized CPU performance. Kleidi unlocks the AI capabilities and performance of Arm Cortex and Neoverse CPUs on any AI framework, without requiring additional integration work by application developers.

With the recent Kleidi integration with PyTorch and the ongoing integration with ExecuTorch, Arm is providing developers with seamless AI performance from the cloud to the edge based on Arm CPUs. Thanks to the integration of Kleidi with PyTorch, the token first response time of Llama 3 LLM running on AWS Graviton processors based on Arm architecture is 2.5 times faster.

At the same time, on the client side, compared with the reference implementation, with the support of the KleidiAI library, the first response time of Llama 3 running on the new Arm Cortex-X925 CPU using the llama.cpp library is 190% faster.

Building the future of AI

The collaboration between Arm and Meta has become a new benchmark for industry collaboration, bringing together the flexibility, popularity and AI capabilities of the Arm computing platform, as well as the technical expertise of industry giants such as Meta, to jointly unlock new opportunities for the widespread application of AI. Whether it is using end-side LLM to meet the personalized needs of users, such as performing tasks based on the user's location, schedule and preferences, or optimizing work efficiency through enterprise-level applications so that users can focus more on strategic tasks, the integration of Arm technology has laid the foundation for the future. In the future, devices will no longer be just command and control tools, but will also play an active role in improving the overall user experience.

Running Meta's latest Llama 3.2 version on Arm CPUs has significantly improved AI performance. This type of open collaboration is the best way to achieve ubiquitous AI innovation and promote sustainable development of AI. Through new LLMs, open source communities and Arm's computing platform, Arm is building the future of AI. By 2025, more than 100 billion Arm-based devices will support AI.

Additional Resources

For mobile and edge ecosystem developers, Llama 3.2 runs efficiently on devices based on Arm Cortex CPUs. See our documentation for developer resources.

Developers can access Arm from all major cloud providers and run Llama 3.2 in the cloud on Arm Neoverse CPUs. See our documentation to learn how to get started.

Keywords：Arm Reference address：Llama 3.2 LLM fully accelerates and expands AI reasoning with the support of Arm computing platform

Previous article：Mouser Electronics Now Selling New Arduino Solutions
Next article：Embrace the AI era and win the future of the storage industry! The 3rd GMIF2024 Innovation Summit was successfully held in Shenzhen

Recommended ReadingLatest update time:2024-11-23 02:57

ARM assembly: the difference between pseudo-instructions DATA and EQU

1. EQU instruction Definition: Used to assign a value or register name to a specified symbolic name. Format: Symbol name EQU expression Symbolic name EQU Register name Note: The expression must be a simple relocation expression. The character name assigned by the EQU instruction can be used as a data address, code a

[Microcontroller]

ARM9 wireless remote control video real-time monitoring car (I) -------- the motor control module of the car

The initial idea of the project was mainly from the remote control of Android phones on the Internet. This is my first time doing this project, so I'm sure I'll waste a lot of money. The total cost of the project is about 1,000 yuan The following pictures show some of the work that has been done in

[Microcontroller]

ARM9 wireless remote control video real-time monitoring car (I) -------- the motor control module of the car

Design and implementation of ARM/DSP DC motor monitoring interface based on QT/E

0 Introduction Embedded technology is widely used in the field of robot control. It concentrates the latest scientific research results in multiple disciplines such as mechanical engineering, automation control, and artificial intelligence. It has become the focus and center of current scientific and technological res

[Microcontroller]

Design and implementation of ARM/DSP DC motor monitoring interface based on QT/E

AI face creation: a capable “assistant” or a “black hand” behind the scenes?

They provide 100,000 AI-generated face resources for users' "face" design, and these can be used for free, providing users with ample creative design space, but today's "AI face creation" is still difficult... It is becoming easier and easier to use AI to fake a "high-end human face". Let's take a look at a larg

[Internet of Things]

AI face creation: a capable “assistant” or a “black hand” behind the scenes?

AI big models move towards multimodality, helping embodied intelligence and robotics achieve innovation

Have you heard of Moravec's paradox? The paradox states that for artificial intelligence (AI) systems, high-level reasoning requires very little computing power, while achieving the perceptual-motor skills that humans take for granted requires enormous computing resources. Essentially, complex logical tasks are

[Embedded]

AI big models move towards multimodality, helping embodied intelligence and robotics achieve innovation

Research and implementation of wall flatness detector based on embedded ARM9

The flatness of a house's wall is an important indicator of the quality of a house's construction. Existing wall flatness detection methods are either inconvenient to operate and inefficient, such as a ruler; or the detection instrument itself is complex and difficult to operate, requiring professional operating skill

[Microcontroller]

Research and implementation of wall flatness detector based on embedded ARM9

Exclusive interview with Shen Bo, CTO of Coolchip Microelectronics: Self-developed AI chip core IP, independent mastery of upgrade methods

Jiwei.com reported that the 2021 World Artificial Intelligence Conference sent a positive signal: AI technology and applications are no longer about who is cooler and more dazzling as before, but have been applied to all walks of life and have come to us. Indeed, in recent years, with the support and integration of de

[Mobile phone portable]

Processor architecture (nine) arm soc documentation

In this document we can learn the following At the architectural level, ARM core gic amba are all architecture series, and then they implement each RF document respectively https://developer.arm.com/architectures/cpu-architecture/a-profile/docs https://developer.arm.com/architectures/system-architectures/amba

[Microcontroller]

Processor architecture (nine) arm soc documentation

Popular Resources
Popular amplifiers