Hotspot丨Accelerating the implementation of edge generative AI, Arm’s new NPU performance increases by 4 times

Latest update time：2024-04-15

Reads：

·Focus: Artificial intelligence, chip and other industries

Welcome all guests to pay attention and forward

Foreword :

After experiencing fierce competition for a period of time, generative AI has now entered the stage of practical application.

In this process, edge applications, especially in the fields of industrial machine vision, wearable devices, and consumer robots, have gradually become the core scenarios for their implementation.

Author | Fang Wensan

Image source | Network

Edge AI faces challenges as it evolves

With the continuous advancement of Transformer and large model technology, the versatility, multi-modal compatibility and model fine-tuning efficiency of AI models have been significantly improved.

At the same time, the integration of low-power AI accelerators and dedicated chips into terminal devices promotes the independence and powerful capabilities of edge intelligence.

In visual and generative AI application scenarios, such as video analysis, image and text fusion, image enhancement and generation, image classification and target detection, the Transformer architecture has shown great value.

Because its attention mechanism is easy to utilize parallel computing, it significantly improves hardware utilization efficiency, allowing these models to be deployed on resource-constrained edge devices.

The huge potential of edge AI indicates that it will become a key driving force for the evolution and transformation of intelligence in many fields.

① When designing edge AI chips and systems, it is necessary to find an appropriate balance between computing power and energy efficiency to ensure efficient performance while meeting power consumption and cost constraints.

② High-performance processing capabilities are often accompanied by higher power consumption, and edge devices have strict requirements on power consumption and cost.

Therefore, we need to reduce power consumption as much as possible while ensuring performance to extend the service life of the device.

③As more and more data is processed at the edge, data security and privacy protection become particularly critical. Therefore, edge AI chip designs must include encryption and security features to ensure data integrity and security.

④ In view of the diversity of edge AI applications, in order to unify diverse application requirements and achieve scale benefits, software definition and easy software portability standards are particularly important.

Arm releases new generation Ethos-U AI accelerator

With the continuous development of artificial intelligence technology, the demand for high-performance computing continues to rise. As a hardware accelerator designed specifically for deep learning and artificial intelligence applications, the emergence of NPU marks a major advancement in AI hardware architecture.

The rise of NPU accelerators stems from the widespread application of artificial intelligence and deep learning algorithms in various industries, as well as the continued growth in demand for high-performance computing.

For Arm, designing high-performance products is not difficult. The key lies in how to accurately define the product.

A significant difference between Ethos-U85 and previous products in the same series is its support for the Transformer model.

Through chaining technology, Ethos-U85 combines element-level operations with previous operations, reducing the need for SRAM when writing and reading intermediate tensors.

This optimization reduces the amount of data transfer between the NPU and memory, thereby improving the efficiency of the NPU.

As the industry's first AI micro-accelerator, the Arm Ethos-U NPU series has always attracted much attention.

As the third generation of the series, Ethos-U85 is designed for edge AI.

Its advantages are not only reflected in the hardware level, but also in the consistency and ease of use of the software tool chain.

Ethos-U85 brings significant performance improvements and energy efficiency improvements to high-performance edge AI applications.

Specifically, it achieves four times the performance improvement and 20% energy efficiency improvement, while maintaining a consistent tool chain, providing developers with a seamless experience.

This product supports configurations from 128 to 2048 MAC units and provides 4TOPs of AI computing power under the highest performance configuration. This enables it to handle a variety of complex AI tasks.

Ethos-U85 targets a wide range of application scenarios, including emerging edge AI fields such as smart home, retail, and industry.

It not only supports AI acceleration in low-power MCU systems, but can also be seamlessly integrated with application processors, standard operating systems, and high-level development languages in high-performance edge computing systems.

This model provides strong support for cloud native development and cloud edge load scheduling.

It is worth mentioning that the new Ethos-U85 NPU also supports mainstream AI frameworks such as TensorFlow Lite and PyTorch.

In addition to providing the weight matrix multiplication operations required by convolutional neural networks (CNN), it can also support matrix multiplication, which is a basic component of the Transformer architecture network.

Corstone-320 new IoT reference design platform

Arm has simultaneously launched a new IoT reference design platform - Corstone-320, which is dedicated to promoting the efficient deployment of voice, audio and visual systems.

The Corstone-320 IoT reference design platform integrates Arm's highest performance Cortex-M CPU - Cortex-M85, Mali-C55 ISP and the new Ethos-U85 NPU.

It provides excellent performance support for a wide range of edge AI applications, such as real-time image classification, target recognition, and voice assistants with natural language translation functions on smart speakers.

In addition, the Corstone-320 IoT reference design platform provides comprehensive software, tools and support, including Arm virtual hardware.

Its integrated design of software and hardware allows developers to start software development work before the physical chip is ready, thus greatly speeding up product launch and shortening the time to market for increasingly complex edge AI devices.

With the Corstone-320 pre-integrated and pre-verified reference design template, Arm can help partners effectively reduce the development cost of edge smart chips and shorten the development cycle.

Arm aims to target domestic CPU market

Engineers in Arm China are working on integrating its NPU driver into the accelerator subsystem, a move that shows their efforts to integrate their technology into the broader industry ecosystem.

In addition, the [Zhouyi] X2 NPU launched by Amou Technology has significantly improved performance and supports open source software, which means that it can more efficiently utilize various computing resources such as CPU, GPU, and NPU.

This kind of openness and compatibility is of great significance for promoting the progress of the domestic CPU industry.

At the same time, Haiguang Information, as a leading company in the domestic CPU and DCU fields, has achieved rapid development driven by the Xinchuang industry and the AI market.

This fully shows that the status of domestic CPUs in the market is steadily improving, especially under the strong demand in the AI field.

The NPU accelerator developed by Arm China provides high-performance, low-power dedicated hardware acceleration, as well as rich debugging tools and multiple levels of development and debugging support;

It provides strong technical support and optimization space for domestic CPUs, thereby assisting the application and development of domestic CPUs in the AI field.

This achievement will not only help promote the advancement of domestic CPU technology, but also enhance the competitiveness of domestic CPUs in the AI field, providing strong support for the independent controllability and development of domestic CPUs.

end:

Edge AI will continue to improve user experience and cope with the rapid growth of data volume in the context of the rise of large models and generative AI.

Through continuous optimization of quantification, pruning and clustering technologies, large models will be more suitable for deployment on edge and super terminal devices.

The collaborative combination of large and small models in the cloud, edge, and terminal will become an important development direction of future AI products, providing strong support for the empowerment of AI applications in various industries.

Reference for some information: 51CTO: "Arm Ma Jian: Using the new generation Ethos-U AI accelerator and the new IoT reference design platform", Electronic Product World: "Arm launches the new generation Ethos-U AI accelerator and the new IoT reference design platform" , Lei Feng.com: "Arm's new NPU performance is improved by 4 times, supports Transformer, and the era of edge generative AI is just around the corner."

The articles and pictures published on this official account are from the Internet and are only used for communication. If there is any infringement, please contact us for a reply. We will process it within 24 hours after receiving the information.

END

Recommended reading:

For business cooperation, please add WeChat:

18948782064

Please be sure to indicate:

"Name + Company + Cooperation Requirements"

Latest articles about

■Report丨Huawei Intelligent World 2030 Report

■Core News丨Baidu will release a pair of smart glasses with built-in AI assistant

■Trend: Satellite communications are penetrating downwards, and wireless communications will become a useful supplement to mobile phones

■Xpeng Motors releases AI humanoid robot Iron

■@Everyone scan the QR code to register for the High-Tech Fair! Click the link to get a 50-yuan ticket for free~

■Analysis丨Huawei is heading in the opposite direction, is Nokia quietly turning?

■Chip News丨Yuanrong Qixing received $100 million in financing from OEMs to plan global mass production and Robotaxi operations

■Industry丨SK Hynix's annual operating profit may surpass Samsung as AI semiconductor boom heats up

■Chip News丨42.5 million euros! India's Tessolve acquires German chip design company DCT

■Exhibition丨Get free tickets! "Black technology" from over 100 countries and regions are here →