Under the light of science and technology, the big model has floated from the palace of the cloud to the stage of the terminal. This historic leap not only gives data processing the wings of speed, but also pushes the intelligent experience to an unprecedented height. The big model on the terminal captures and responds to every subtle demand in an agile manner, extending the tentacles of AI to every corner of the world.
Recently, at the 12th China Hard Technology Industry Chain Innovation Trend Summit and Hundred Media Forum hosted by EEVIA, Bao Minqi, product director of Arm Technology, delivered a wonderful keynote speech entitled "Opportunities of AI Application Chips on the Edge, NPU Accelerates the Upgrade of Terminal Computing Power". He deeply analyzed the broad prospects for the development of AI on the edge and introduced in detail the latest progress of Arm Technology's self-developed NPU.
The rise of edge AI
The computing power improvement brought by the AIGC large model is the biggest opportunity for edge AI. Bao Minqi said that from the recent releases of leading manufacturers, it can be seen that the application of edge AI has been unanimously recognized by the industry.
At present, the size of mainstream large models actually deployed internationally and domestically is mainly concentrated below 10 billion parameters. This limitation is mainly due to the memory bandwidth range of the end-side devices, which is usually between 50-100GB/s. In order to meet users' demand for real-time applications, large models with 1-3 billion parameters are most suitable for deployment under existing bandwidth conditions. These models can provide fast response and high-quality services while maintaining efficient performance.
Leading terminal manufacturers such as OPPO, vivo, Xiaomi, Honor and Huawei are actively promoting the development of end-side AI. They have not only developed large models suitable for end-side deployment, but also closely integrated them with specific business scenarios. Chip manufacturers have also reached a consensus that AI NPU (neural network processing unit) will be the focus of future consumer electronics development. NPU can greatly improve the AI computing capabilities of end-side devices while reducing power consumption through specially optimized hardware architecture.
Despite the strong development momentum of end-side AI, Bao Minqi emphasized that this does not mean that cloud-side AI should be completely abandoned. On the contrary, he believes that the two should complement each other to generate the greatest benefits. The advantage of end-side AI lies in its timeliness and the security brought by data localization. Since data processing occurs locally on the device, the user's privacy is better protected, and real-time response can also be achieved. Cloud-side AI has stronger reasoning ability and large-scale data processing capabilities, and can perform more complex tasks. Therefore, combining the advantages of the end-side and the cloud will provide users with a more comprehensive and efficient AI experience.
From the development history of human-computer interaction interface, from the initial physical buttons to touch screens and voice interaction, and then to the current Agent intelligent body, each change has greatly improved the user experience. The future trend will be multimodal scenarios, that is, combining multiple input methods such as images, audio, and video, so that the device can understand the user's needs more comprehensively. Through observation and learning, future AI systems will be able to better predict and meet user expectations, thus achieving true intelligence.
Responding to triple challenges with triple upgrades
The rapid development of edge AI has brought triple challenges to hardware devices: cost, power consumption, and ecosystem.
The cost challenge mainly comes from the storage capacity, bandwidth and chip computing resources of the device. The power consumption comes from the large amount of data moved, and the large model cannot be highly reused like CNN, which will also greatly increase the power consumption. Finally, the continuous optimization and support of development tools is also a challenge.
To address these challenges, Arm Technology's self-developed "Zhouyi" NPU has been upgraded in terms of microarchitecture, energy efficiency and parallel processing.
-
Microarchitecture: In view of the differences between CNN and Transformer, the "Zhouyi" NPU has been optimized for Transformer while retaining the capabilities of CNN, overcoming the bottleneck in actual computing.
-
Efficiency: Mixed precision quantization, such as int4 and fp16, is performed to achieve low-precision quantization at the algorithm and tool chain level. At the same time, data is losslessly compressed and the sparsity is changed to increase the effective bandwidth. In addition, the In-NPU interconnection technology is used to expand the bus bandwidth.
-
Parallel processing: Use data parallelism or model parallelism, load balancing and tiling to reduce data movement.
Bao Minqi also introduced in detail the next-generation Zhouyi "NPU" architecture, which not only includes a Task Schedule Manager that can adapt to multi-tasking scenarios, but also the entire architecture has scalability, and adds DRAM to achieve high-bandwidth matching, and also adds OCM (Optional on Chip SRAM) to support algorithms with special requirements.
Bao Minqi emphasized the support of Zhouyi NPU for heterogeneous computing in his speech, and pointed out that heterogeneous computing is the best choice for edge AI chips from the perspective of energy efficiency and the area of the entire SoC (system-on-chip). He explained that in the face of different application scenarios, heterogeneous computing can achieve flexible tailoring of computing power and minimize unnecessary power consumption.
Cross-disciplinary application experts
Zhouyi "NPU" has demonstrated its powerful performance and flexibility in multiple key areas, especially in automotive applications, AI accelerator cards, and AIoT scenarios.
In automotive applications, different scenarios will correspond to different computing power requirements. If it is an in-vehicle infotainment system, the computing power requirement will not be too high, but in ADAS applications, in many cases, multiple tasks need to be performed, and the computing power requirement will be greatly increased. The computing power range of the "Zhouyi" NPU is 20~320TOPS, and the required computing power can be tailored according to demand. Bao Minqi said that the CoreEngine Technology "Dragon Eagle No. 1" equipped with the "Zhouyi" NPU has shipped more than 400,000 pieces in total, and has been successfully applied to more than 20 main models of Geely's Lynk & Co, Galaxy series and FAW Hongqi.
In the application of AI accelerator cards, Zhouyi "NPU" can efficiently interact with different types of host processors (Host AP) such as smart cars, PCs, robots, etc., and process various data forms such as audio, images, and videos. This multimodal model support capability enables Zhouyi "NPU" to maintain high performance and flexibility in complex data environments. In AIoT scenarios, devices are often subject to strict restrictions on area and power consumption. Despite this, Zhouyi "NPU" can still provide efficient computing power support while ensuring a high degree of security. This makes it an ideal choice for multiple application scenarios.
Bao Minqi finally said that the next generation of Zhouyi "NPU" will inherit and significantly enhance the characteristics and advantages of the previous generation of products such as strong computing power, easy deployment and programmability, and continue to optimize in many aspects such as accuracy, bandwidth, scheduling management, operator support, etc. At the same time, NPU should not only consider the adaptation of current storage media, but also consider the adaptation of various storage media in the future, so that NPU can better meet current and future market needs.
Previous article:In the era of AI big models, how can GPU high-speed interconnection break through correctly?
Next article:最后一页
- Popular Resources
- Popular amplifiers
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Qualcomm launches its first RISC-V architecture programmable connectivity module QCC74xM, supporting Wi-Fi 6 and other protocols
- Microchip Launches Broadest Portfolio of IGBT 7 Power Devices Designed for Sustainable Development, E-Mobility and Data Center Applications
- Infineon Technologies Launches New High-Performance Microcontroller AURIX™ TC4Dx
- Rambus Announces Industry’s First HBM4 Controller IP to Accelerate Next-Generation AI Workloads
- NXP FRDM platform promotes wireless connectivity
- WPG Group launches Wi-Fi 7 home gateway solution based on Qualcomm products
- Exclusive interview with Silicon Labs: In-depth discussion on the future development trend of Bluetooth 6.0
- Works With Online Developer Conference is about to start, experience the essence of global activities online
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- ADI chips have increased by NNN times?
- How to improve the anti-interference ability of wireless modules
- [Zero-knowledge ESP8266 tutorial] WIFI TCP protocol communication TCP server example
- What is RFID technology?
- RISCV DMIPS0.8, 80Mhz soft core open source CPU implementation in FPGA
- [Synopsys IP Resources] 97% of tested applications have security vulnerabilities. Is your software secure?
- Face recognition model building process
- Let’s talk about the advantages of GaN in the RF field
- Classification and circuit composition of power amplifiers
- Can you guys help me analyze this ultrasonic receiving circuit? Is the op amp a bandpass filter? How do I calculate the center frequency?