Hotspot丨Accelerating the implementation of edge generative AI, Arm’s new NPU performance increases by 4 times
·Focus: Artificial intelligence, chip and other industries
Welcome all guests to pay attention and forward
With the continuous advancement of Transformer and large model technology, the versatility, multi-modal compatibility and model fine-tuning efficiency of AI models have been significantly improved.
At the same time, the integration of low-power AI accelerators and dedicated chips into terminal devices promotes the independence and powerful capabilities of edge intelligence.
In visual and generative AI application scenarios, such as video analysis, image and text fusion, image enhancement and generation, image classification and target detection, the Transformer architecture has shown great value.
Because its attention mechanism is easy to utilize parallel computing, it significantly improves hardware utilization efficiency, allowing these models to be deployed on resource-constrained edge devices.
The huge potential of edge AI indicates that it will become a key driving force for the evolution and transformation of intelligence in many fields.
① When designing edge AI chips and systems, it is necessary to find an appropriate balance between computing power and energy efficiency to ensure efficient performance while meeting power consumption and cost constraints.
② High-performance processing capabilities are often accompanied by higher power consumption, and edge devices have strict requirements on power consumption and cost.
Therefore, we need to reduce power consumption as much as possible while ensuring performance to extend the service life of the device.
③As more and more data is processed at the edge, data security and privacy protection become particularly critical. Therefore, edge AI chip designs must include encryption and security features to ensure data integrity and security.
④ In view of the diversity of edge AI applications, in order to unify diverse application requirements and achieve scale benefits, software definition and easy software portability standards are particularly important.
With the continuous development of artificial intelligence technology, the demand for high-performance computing continues to rise. As a hardware accelerator designed specifically for deep learning and artificial intelligence applications, the emergence of NPU marks a major advancement in AI hardware architecture.
The rise of NPU accelerators stems from the widespread application of artificial intelligence and deep learning algorithms in various industries, as well as the continued growth in demand for high-performance computing.
For Arm, designing high-performance products is not difficult. The key lies in how to accurately define the product.
A significant difference between Ethos-U85 and previous products in the same series is its support for the Transformer model.
Through chaining technology, Ethos-U85 combines element-level operations with previous operations, reducing the need for SRAM when writing and reading intermediate tensors.
This optimization reduces the amount of data transfer between the NPU and memory, thereby improving the efficiency of the NPU.
As the industry's first AI micro-accelerator, the Arm Ethos-U NPU series has always attracted much attention.
As the third generation of the series, Ethos-U85 is designed for edge AI.
Its advantages are not only reflected in the hardware level, but also in the consistency and ease of use of the software tool chain.
Ethos-U85 brings significant performance improvements and energy efficiency improvements to high-performance edge AI applications.
Specifically, it achieves four times the performance improvement and 20% energy efficiency improvement, while maintaining a consistent tool chain, providing developers with a seamless experience.
This product supports configurations from 128 to 2048 MAC units and provides 4TOPs of AI computing power under the highest performance configuration. This enables it to handle a variety of complex AI tasks.
Ethos-U85 targets a wide range of application scenarios, including emerging edge AI fields such as smart home, retail, and industry.
It not only supports AI acceleration in low-power MCU systems, but can also be seamlessly integrated with application processors, standard operating systems, and high-level development languages in high-performance edge computing systems.
This model provides strong support for cloud native development and cloud edge load scheduling.
It is worth mentioning that the new Ethos-U85 NPU also supports mainstream AI frameworks such as TensorFlow Lite and PyTorch.
In addition to providing the weight matrix multiplication operations required by convolutional neural networks (CNN), it can also support matrix multiplication, which is a basic component of the Transformer architecture network.
Arm has simultaneously launched a new IoT reference design platform - Corstone-320, which is dedicated to promoting the efficient deployment of voice, audio and visual systems.
The Corstone-320 IoT reference design platform integrates Arm's highest performance Cortex-M CPU - Cortex-M85, Mali-C55 ISP and the new Ethos-U85 NPU.
It provides excellent performance support for a wide range of edge AI applications, such as real-time image classification, target recognition, and voice assistants with natural language translation functions on smart speakers.
In addition, the Corstone-320 IoT reference design platform provides comprehensive software, tools and support, including Arm virtual hardware.
Its integrated design of software and hardware allows developers to start software development work before the physical chip is ready, thus greatly speeding up product launch and shortening the time to market for increasingly complex edge AI devices.
With the Corstone-320 pre-integrated and pre-verified reference design template, Arm can help partners effectively reduce the development cost of edge smart chips and shorten the development cycle.
Engineers in Arm China are working on integrating its NPU driver into the accelerator subsystem, a move that shows their efforts to integrate their technology into the broader industry ecosystem.
In addition, the [Zhouyi] X2 NPU launched by Amou Technology has significantly improved performance and supports open source software, which means that it can more efficiently utilize various computing resources such as CPU, GPU, and NPU.
This kind of openness and compatibility is of great significance for promoting the progress of the domestic CPU industry.
At the same time, Haiguang Information, as a leading company in the domestic CPU and DCU fields, has achieved rapid development driven by the Xinchuang industry and the AI market.
This fully shows that the status of domestic CPUs in the market is steadily improving, especially under the strong demand in the AI field.
The NPU accelerator developed by Arm China provides high-performance, low-power dedicated hardware acceleration, as well as rich debugging tools and multiple levels of development and debugging support;
It provides strong technical support and optimization space for domestic CPUs, thereby assisting the application
and development of domestic CPUs in the AI field.
This achievement will not only help promote the advancement of domestic CPU technology, but also enhance the competitiveness of domestic CPUs in the AI field, providing strong support for the independent controllability and development of domestic CPUs.
Edge AI will continue to improve user experience and cope with the rapid growth of data volume in the context of the rise of large models and generative AI.
Through continuous optimization of quantification, pruning and clustering technologies, large models will be more suitable for deployment on edge and super terminal devices.
The collaborative combination of large and small models in the cloud, edge, and terminal will become an important development direction of future AI products, providing strong support for the empowerment of AI applications in various industries.
Reference for some information: 51CTO: "Arm Ma Jian: Using the new generation Ethos-U AI accelerator and the new IoT reference design platform", Electronic Product World: "Arm launches the new generation Ethos-U AI accelerator and the new IoT reference design platform" , Lei Feng.com: "Arm's new NPU performance is improved by 4 times, supports Transformer, and the era of edge generative AI is just around the corner."
Recommended reading: