Article count:282 Read by:641628

Account Entry

ExecuTorch Beta Released to Accelerate Generative AI Development on the Edge on Arm Platforms

Latest update time:2024-11-08
    Reads:

Click on the Arm Community above to follow us

Author: Alex Spinelli, Senior Vice President of Artificial Intelligence and Developer Platforms and Services, Strategy and Ecosystem, Arm

News Highlights

By combining the Arm computing platform with the ExecuTorch framework, smaller and more optimized models can run on the edge, accelerating the implementation of generative AI on the edge.

The new Llama quantization model is suitable for end-side and edge AI applications based on Arm platforms, which can reduce memory usage and improve accuracy, performance and portability.

The world’s 20 million Arm developers can develop and deploy more intelligent AI applications faster and at scale on billions of edge devices.

Arm is working with the PyTorch team at Meta to advance the launch of the new ExecuTorch Beta, which aims to provide artificial intelligence (AI) and machine learning (ML) capabilities to billions of edge devices and millions of developers around the world, thereby ensuring that the true potential of AI can be used by the widest range of devices and developers.

Arm compute platforms optimize generative AI performance with ExecuTorch and new Llama quantization models

Arm computing platforms are ubiquitous, powering many edge devices around the world, and ExecuTorch is a PyTorch native deployment framework designed for deploying AI models on mobile and edge devices. The close collaboration between the two enables developers to enable smaller and more optimized models, including the new Llama 3.2 1B and 3B quantized models. These new models can reduce memory usage, improve accuracy, enhance performance, and provide portability, making them ideal for generative AI applications on small devices, such as virtual chatbots, text summarization, and AI assistants.

Developers can seamlessly integrate the new quantization models into their applications without additional modifications or optimizations, saving time and resources, allowing them to quickly develop and deploy more intelligent AI applications at scale on a wide range of Arm devices.

With the release of the new Llama 3.2 Large Language Model (LLM) version , Arm is optimizing AI performance through the ExecuTorch framework, making real-world generative AI workloads faster on edge devices on Arm computing platforms. Developers can enjoy these performance improvements from the first day of the ExecuTorch beta release.

Integrate KleidiAI to accelerate the implementation of generative AI on the edge

In the mobile space, Arm’s collaboration with ExecuTorch means that many generative AI applications, such as virtual chatbots, text generation and summarization, real-time voice and virtual assistants, can run at higher performance on devices equipped with Arm CPUs. This achievement is due to KleidiAI, which introduces a microkernel optimized for 4-bit quantization and integrates it into ExecuTorch through XNNPACK, so that when running 4-bit quantized LLMs on Arm computing platforms, the execution of AI workloads is seamlessly accelerated. For example, with the integration of KleidiAI, the execution speed of the Llama 3.2 1B quantization model pre-filling phase can be increased by 20%, allowing text generation on some Arm-based mobile devices to exceed 400 tokens per second. This means that end users will get a faster and more responsive AI experience from their mobile devices.

Accelerate real-time processing capabilities for edge AI applications in IoT

In the field of IoT, ExecuTorch will improve the real-time processing capabilities of edge AI applications, including smart home appliances, wearable devices, and automated retail systems. This means that IoT devices and applications can respond to environmental changes at millisecond speeds, which is critical to ensuring safety and functional availability.

ExecuTorch runs on Arm ® Cortex ® -A CPUs and Ethos™-U NPUs to accelerate the development and deployment of edge AI applications. In fact, by integrating ExecuTorch with the Arm Corstone™-320 reference platform (also available as an emulated fixed virtual platform (FVP)), Arm Ethos-U85 NPU drivers and compiler support into a single package, developers can start developing edge AI applications months before the platform is available.

Easier and faster edge AI development experience

ExecuTorch has the potential to become one of the world’s most popular and efficient AI and ML development frameworks. By combining the most widely used Arm compute platform with ExecuTorch, Arm is accelerating the adoption of AI through new quantized models, enabling developers to deploy applications on more devices faster, and bringing more generative AI experiences to the edge.

Click to read the original article


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号