Intelligence is everywhere: Arm Technology's "Zhouyi" NPU opens a new era of edge AI-EEWORLD

Collect

Under the light of science and technology, the big model has floated from the palace of the cloud to the stage of the terminal. This historic leap not only gives data processing the wings of speed, but also pushes the intelligent experience to an unprecedented height. The big model on the terminal captures and responds to every subtle demand in an agile manner, extending the tentacles of AI to every corner of the world.

Recently, at the 12th China Hard Technology Industry Chain Innovation Trend Summit and Hundred Media Forum hosted by EEVIA, Bao Minqi, product director of Arm Technology, delivered a wonderful keynote speech entitled "Opportunities of AI Application Chips on the Edge, NPU Accelerates the Upgrade of Terminal Computing Power". He deeply analyzed the broad prospects for the development of AI on the edge and introduced in detail the latest progress of Arm Technology's self-developed NPU.

The rise of edge AI

The computing power improvement brought by the AIGC large model is the biggest opportunity for edge AI. Bao Minqi said that from the recent releases of leading manufacturers, it can be seen that the application of edge AI has been unanimously recognized by the industry.

At present, the size of mainstream large models actually deployed internationally and domestically is mainly concentrated below 10 billion parameters. This limitation is mainly due to the memory bandwidth range of the end-side devices, which is usually between 50-100GB/s. In order to meet users' demand for real-time applications, large models with 1-3 billion parameters are most suitable for deployment under existing bandwidth conditions. These models can provide fast response and high-quality services while maintaining efficient performance.

Leading terminal manufacturers such as OPPO, vivo, Xiaomi, Honor and Huawei are actively promoting the development of end-side AI. They have not only developed large models suitable for end-side deployment, but also closely integrated them with specific business scenarios. Chip manufacturers have also reached a consensus that AI NPU (neural network processing unit) will be the focus of future consumer electronics development. NPU can greatly improve the AI computing capabilities of end-side devices while reducing power consumption through specially optimized hardware architecture.

Despite the strong development momentum of end-side AI, Bao Minqi emphasized that this does not mean that cloud-side AI should be completely abandoned. On the contrary, he believes that the two should complement each other to generate the greatest benefits. The advantage of end-side AI lies in its timeliness and the security brought by data localization. Since data processing occurs locally on the device, the user's privacy is better protected, and real-time response can also be achieved. Cloud-side AI has stronger reasoning ability and large-scale data processing capabilities, and can perform more complex tasks. Therefore, combining the advantages of the end-side and the cloud will provide users with a more comprehensive and efficient AI experience.

From the development history of human-computer interaction interface, from the initial physical buttons to touch screens and voice interaction, and then to the current Agent intelligent body, each change has greatly improved the user experience. The future trend will be multimodal scenarios, that is, combining multiple input methods such as images, audio, and video, so that the device can understand the user's needs more comprehensively. Through observation and learning, future AI systems will be able to better predict and meet user expectations, thus achieving true intelligence.

Responding to triple challenges with triple upgrades

The rapid development of edge AI has brought triple challenges to hardware devices: cost, power consumption, and ecosystem.

The cost challenge mainly comes from the storage capacity, bandwidth and chip computing resources of the device. The power consumption comes from the large amount of data moved, and the large model cannot be highly reused like CNN, which will also greatly increase the power consumption. Finally, the continuous optimization and support of development tools is also a challenge.

To address these challenges, Arm Technology's self-developed "Zhouyi" NPU has been upgraded in terms of microarchitecture, energy efficiency and parallel processing.

Microarchitecture: In view of the differences between CNN and Transformer, the "Zhouyi" NPU has been optimized for Transformer while retaining the capabilities of CNN, overcoming the bottleneck in actual computing.
Efficiency: Mixed precision quantization, such as int4 and fp16, is performed to achieve low-precision quantization at the algorithm and tool chain level. At the same time, data is losslessly compressed and the sparsity is changed to increase the effective bandwidth. In addition, the In-NPU interconnection technology is used to expand the bus bandwidth.
Parallel processing: Use data parallelism or model parallelism, load balancing and tiling to reduce data movement.

Bao Minqi also introduced in detail the next-generation Zhouyi "NPU" architecture, which not only includes a Task Schedule Manager that can adapt to multi-tasking scenarios, but also the entire architecture has scalability, and adds DRAM to achieve high-bandwidth matching, and also adds OCM (Optional on Chip SRAM) to support algorithms with special requirements.

Bao Minqi emphasized the support of Zhouyi NPU for heterogeneous computing in his speech, and pointed out that heterogeneous computing is the best choice for edge AI chips from the perspective of energy efficiency and the area of the entire SoC (system-on-chip). He explained that in the face of different application scenarios, heterogeneous computing can achieve flexible tailoring of computing power and minimize unnecessary power consumption.

Cross-disciplinary application experts

Zhouyi "NPU" has demonstrated its powerful performance and flexibility in multiple key areas, especially in automotive applications, AI accelerator cards, and AIoT scenarios.

In automotive applications, different scenarios will correspond to different computing power requirements. If it is an in-vehicle infotainment system, the computing power requirement will not be too high, but in ADAS applications, in many cases, multiple tasks need to be performed, and the computing power requirement will be greatly increased. The computing power range of the "Zhouyi" NPU is 20~320TOPS, and the required computing power can be tailored according to demand. Bao Minqi said that the CoreEngine Technology "Dragon Eagle No. 1" equipped with the "Zhouyi" NPU has shipped more than 400,000 pieces in total, and has been successfully applied to more than 20 main models of Geely's Lynk & Co, Galaxy series and FAW Hongqi.

In the application of AI accelerator cards, Zhouyi "NPU" can efficiently interact with different types of host processors (Host AP) such as smart cars, PCs, robots, etc., and process various data forms such as audio, images, and videos. This multimodal model support capability enables Zhouyi "NPU" to maintain high performance and flexibility in complex data environments. In AIoT scenarios, devices are often subject to strict restrictions on area and power consumption. Despite this, Zhouyi "NPU" can still provide efficient computing power support while ensuring a high degree of security. This makes it an ideal choice for multiple application scenarios.

Bao Minqi finally said that the next generation of Zhouyi "NPU" will inherit and significantly enhance the characteristics and advantages of the previous generation of products such as strong computing power, easy deployment and programmability, and continue to optimize in many aspects such as accuracy, bandwidth, scheduling management, operator support, etc. At the same time, NPU should not only consider the adaptation of current storage media, but also consider the adaptation of various storage media in the future, so that NPU can better meet current and future market needs.

Reference address：Intelligence is everywhere: Arm Technology's "Zhouyi" NPU opens a new era of edge AI

Previous article：In the era of AI big models, how can GPU high-speed interconnection break through correctly?
Next article：最后一页

Popular Resources
Popular amplifiers