ExecuTorch Beta Released to Accelerate Generative AI Development on the Edge on Arm Platforms

Latest update time：2024-11-08

Reads：

Click on the Arm Community above to follow us

Author: Alex Spinelli, Senior Vice President of Artificial Intelligence and Developer Platforms and Services, Strategy and Ecosystem, Arm

News Highlights

By combining the Arm computing platform with the ExecuTorch framework, smaller and more optimized models can run on the edge, accelerating the implementation of generative AI on the edge.

The new Llama quantization model is suitable for end-side and edge AI applications based on Arm platforms, which can reduce memory usage and improve accuracy, performance and portability.

The world’s 20 million Arm developers can develop and deploy more intelligent AI applications faster and at scale on billions of edge devices.

Arm is working with the PyTorch team at Meta to advance the launch of the new ExecuTorch Beta, which aims to provide artificial intelligence (AI) and machine learning (ML) capabilities to billions of edge devices and millions of developers around the world, thereby ensuring that the true potential of AI can be used by the widest range of devices and developers.

Arm compute platforms optimize generative AI performance with ExecuTorch and new Llama quantization models

Arm computing platforms are ubiquitous, powering many edge devices around the world, and ExecuTorch is a PyTorch native deployment framework designed for deploying AI models on mobile and edge devices. The close collaboration between the two enables developers to enable smaller and more optimized models, including the new Llama 3.2 1B and 3B quantized models. These new models can reduce memory usage, improve accuracy, enhance performance, and provide portability, making them ideal for generative AI applications on small devices, such as virtual chatbots, text summarization, and AI assistants.

Developers can seamlessly integrate the new quantization models into their applications without additional modifications or optimizations, saving time and resources, allowing them to quickly develop and deploy more intelligent AI applications at scale on a wide range of Arm devices.

With the release of the new Llama 3.2 Large Language Model (LLM) version , Arm is optimizing AI performance through the ExecuTorch framework, making real-world generative AI workloads faster on edge devices on Arm computing platforms. Developers can enjoy these performance improvements from the first day of the ExecuTorch beta release.

Integrate KleidiAI to accelerate the implementation of generative AI on the edge

In the mobile space, Arm’s collaboration with ExecuTorch means that many generative AI applications, such as virtual chatbots, text generation and summarization, real-time voice and virtual assistants, can run at higher performance on devices equipped with Arm CPUs. This achievement is due to KleidiAI, which introduces a microkernel optimized for 4-bit quantization and integrates it into ExecuTorch through XNNPACK, so that when running 4-bit quantized LLMs on Arm computing platforms, the execution of AI workloads is seamlessly accelerated. For example, with the integration of KleidiAI, the execution speed of the Llama 3.2 1B quantization model pre-filling phase can be increased by 20%, allowing text generation on some Arm-based mobile devices to exceed 400 tokens per second. This means that end users will get a faster and more responsive AI experience from their mobile devices.

Accelerate real-time processing capabilities for edge AI applications in IoT

In the field of IoT, ExecuTorch will improve the real-time processing capabilities of edge AI applications, including smart home appliances, wearable devices, and automated retail systems. This means that IoT devices and applications can respond to environmental changes at millisecond speeds, which is critical to ensuring safety and functional availability.

ExecuTorch runs on Arm ^® Cortex ^® -A CPUs and Ethos™-U NPUs to accelerate the development and deployment of edge AI applications. In fact, by integrating ExecuTorch with the Arm Corstone™-320 reference platform (also available as an emulated fixed virtual platform (FVP)), Arm Ethos-U85 NPU drivers and compiler support into a single package, developers can start developing edge AI applications months before the platform is available.

Easier and faster edge AI development experience

ExecuTorch has the potential to become one of the world’s most popular and efficient AI and ML development frameworks. By combining the most widely used Arm compute platform with ExecuTorch, Arm is accelerating the adoption of AI through new quantized models, enabling developers to deploy applications on more devices faster, and bringing more generative AI experiences to the edge.

Click to read the original article

Latest articles about

■The open source circle of friends continues to expand! Arm joins the OpenCloudOS operating system open source community

■Arm Kleidi technology enables cost-effective automatic speech recognition on the Arm Neoverse N2 platform

■Real-time low-light video enhancement using mobile neural networks

■Arm launches AI tools on GitHub to simplify the development and deployment of AI applications

■Haven't registered yet? You may miss a great opportunity to learn about the cutting-edge development of AI

■The Everest chip based on Arm architecture accelerates the ultimate video experience

■On November 5th in Shenzhen, listen to how Arm interprets edge intelligence in the era of big models

■Microsoft Azure Cobalt 100 virtual machines based on Arm Neoverse are now available, improving cloud service efficiency and performance

■Arm joins hands with MediaTek and vivo to enable next-generation AI smartphone user experience

■Armv9 Technology Lecture | IPC increased by 15%! Arm Cortex-X925 provides powerful performance for users' actual needs