With the implementation of 5G, the cost-effectiveness of the Internet of Things has become apparent, and the evolutionary trends of industrial digitization and urban intelligence have become increasingly obvious. More and more companies and cities have begun to add disruptive concepts such as digital twins to IoT innovations to improve productivity and efficiency, reduce costs, and accelerate the construction of new smart cities. It is worth mentioning that digital twin technology has been written into the country's "14th Five-Year Plan", providing national strategic guidance for the construction of digital twin cities.
Regarding digital twins, we can give an example of unmanned retail concept stores launched by Amazon and JD.com a few years ago, which turned offline retail stores into online Taobao stores. Before people go shopping in the store, they only need to open the APP and complete the face recognition login in the settings. After the face recognition is successful, the account can be automatically linked when the face is scanned to open the door. After shopping, there is no need to queue up to check out manually, and you can leave by just scanning your face. It seems that there is no one to manage, but behind it is the full tracking of artificial intelligence. Every move of the consumer is captured by the camera. For example, if you pick up a product and look at it again and again, it means that you are very interested in this product, but you did not buy it due to some concerns, and finally bought another product. Such data will be captured and analyzed in depth to form a basic database, and then it can be pushed periodically based on all your shopping records and consumption habits.
Through this example, we can see the convenience brought by digitizing the physical world. And vision is an important means for humans to perceive the world. The foundation for humans to enter the intelligent society is digitization, and perception is the premise of digitizing the physical world. The type, quantity and quality of front-end visual perception determine the degree of intelligence of our society. It can be seen that the foundation of the intelligent future is "perception + computing". AI vision will play a very critical role in the process of intelligence and has a very broad application prospect. Some industry analysts believe that digital twin technology is about to surpass manufacturing and enter the integration fields of the Internet of Things, artificial intelligence and data analysis. This is why we chose this entrepreneurial direction.
As the most important entrance from the physical world to the digital twin world, visual chips are receiving widespread attention, especially AI visual perception chips that can restore the physical world by 80%-90%.
So what is an AI visual perception chip? From the perspective of demand, an AI visual perception chip needs to have two major functions: one is to see clearly, and the other is to understand. The AI-ISP is responsible for seeing clearly, and the AI-NPU is responsible for understanding.
Figure | Technical features of AI vision chips
In fact, in a broad sense, any chip that can achieve AI acceleration in artificial intelligence applications can be called an AI chip, and the module used to improve the efficiency of AI algorithm operation is often called an NPU (neural network processor). At present, AI vision chips accelerated by NPU have been widely used in the fields of big data, intelligent driving and image processing.
According to the latest data released by IDC, the size of the accelerated server market reached US$5.39 billion in 2021, a year-on-year increase of 68.6%. Among them, GPU servers dominate with a market share of 90%, and non-GPU accelerated servers such as ASIC and FPGA have a market share of 11.6% with a growth rate of 43.8%, reaching US$630 million. This means that the application of neural network processors NPU has moved out of the early pilot stage and is becoming a key requirement in the artificial intelligence business. So, today we will talk about the AI-NPU responsible for "seeing more clearly" and "understanding".
Why is it said that seeing more clearly is also related to AI-NPU? From the perspective of people's intuitive feelings, "seeing clearly" is easy to understand. For example, at night we want to see things more clearly, but the pictures taken by traditional cameras are often overexposed and the color details are submerged. At the same time, there will be noise around moving people and distant buildings. So, in a situation like this, how can we better achieve "seeing clearly"? In fact, the visual chip cannot "see clearly" without the support of AI-NPU's high computing power.
Figure | Night video effect comparison
Taking smart cities as an example, we have used 5-megapixel cameras for intelligent analysis. Traditional video quality improvement uses traditional ISP technology. In low-light scenes, there will be a lot of noise. Using AI-ISP can solve this problem and still provide clear pictures in low-light scenes. However, when using AI-ISP technology, the video must be processed with AI algorithms at full resolution and full frame rate, and it cannot be processed by taking shortcuts such as reducing resolution or skipping frames, because the human eye is very sensitive to flickering of image quality. And for a 5-megapixel video stream, to achieve full resolution and full frame rate processing, it will place very high demands on the computing power of the NPU.
In intelligent analysis scenarios, such as vehicle detection and license plate recognition, it is common to use a 5-megapixel camera to record a video at a frame rate of 30fps, and then perform a detection every 3/5 frames. When performing the detection, the resolution is reduced to 720P. This method will not be able to recognize license plates that are far away in the video, and may miss vehicles traveling at high speeds. The solution is to try to use full-resolution and higher frame rate detection methods for processing, which also places very high demands on the computing power of the NPU.
In addition, as mentioned earlier, in addition to seeing clearly, we also need to understand. The so-called understanding means to do intelligent analysis, and to do intelligent analysis we also need the support of AI-NPU's large computing power. We can look at this issue from two perspectives.
First of all, we know that AI itself is a tool to improve efficiency, and it will eventually fall into the scene, which is the concept of early AI+ and the recent +AI. So, when AI falls into the industry, what can it do? In fact, AI can do a lot of things, such as replacing the expert systems of some industries with neural networks. This is equivalent to installing such an "expert" in our AI chip. This expert system must be smart enough, which corresponds to a smarter or larger network. A larger network is equivalent to a larger brain capacity, which can maintain and store more weight values, which will put high demands on the NPU computing power.
Secondly, from the perspective of deployment, most of our model training is currently run on servers with high computing power, while deployment is on devices with limited computing power. Only by reducing the amount of computing of the model or algorithm to the extent that the end-side can run can it be better implemented on the application side. Therefore, the model compression process is required, and model compression has high technical requirements for technicians. If the computing power of our end-side is relatively high, this process can actually be shortened. This is similar to the process of developing embedded software. In the early days, we were limited by the computing power bottleneck. In order to run more functions, we needed to squeeze the performance of the hardware very seriously, so we used assembly to write programs, but if the computing power was relatively high, we could use C language for development. In other words, it is feasible to use part of the computing power in exchange for improved development efficiency and accelerated AI implementation, but this approach in turn increases the requirements for NPU computing power.
Above, we analyzed the driving force behind why AI visual perception chip companies want to develop high-performance, high-computing-power NPUs. However, it is very difficult to truly develop chips with high computing power.
As we all know, computing power is an important indicator of NPU performance. However, the computing power of many early AI chips is actually a nominal value, and they cannot achieve the nominal performance when actually used. For example, a computing power of 1T is claimed, but in actual operation, it is found that only 200G or 3~400G can be used. Therefore, people now use more practical FPS/W or FPS/$ as an evaluation indicator to measure the running efficiency of advanced algorithms on computing platforms.
Figure | Design difficulties and driving forces of AI-NPU
In the field of autonomous driving, when Tesla released the FSD chip in 2017, Musk compared FSD with Nvidia Drive PX2, which was previously used on Tesla, and said: " From the perspective of computing power, FSD is three times that of Drive PX2, but when performing autonomous driving tasks, its FPS is 21 times that of the latter ."
In the field of AI vision chips, Aixin Yuanzhi released the first high-performance, low-power artificial intelligence vision processor chip AX630A. When comparing the running speeds of different neural networks under public data sets, the processing frames per second were 3116 and 1356 respectively, far exceeding other similar chip products, and the power consumption was only about 3W.
Figure | AX630A product block diagram
What exactly has widened the gap in the utilization of these NPUs? Behind this is actually the problem of memory wall and power consumption wall. The so-called memory wall means that when we increase the computing power index by stacking MAC units, the data bandwidth must keep up, otherwise the insufficient data supply capacity will cause the MAC unit to constantly wait for data, and the processing performance will decrease. The problem of power consumption wall mainly comes from two aspects: MAC unit and DDR. When we increase the computing power index by stacking MAC units, the total power consumption of the MAC unit itself will increase, and high bandwidth support is also required. The more expensive HBM can be used on the server side, so the power consumption required for DDR is bound to increase. On the end side, due to cost considerations, there is no particularly good DDR solution.
Previous article:8-bit MCUs in the IoT: Simplifying advanced architecture interfaces with traditional chips
Next article:Unveiling the IPU design philosophy: Designing processors from the customer’s perspective
- Popular Resources
- Popular amplifiers
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- CGD and Qorvo to jointly revolutionize motor control solutions
- CGD and Qorvo to jointly revolutionize motor control solutions
- Keysight Technologies FieldFox handheld analyzer with VDI spread spectrum module to achieve millimeter wave analysis function
- Infineon's PASCO2V15 XENSIV PAS CO2 5V Sensor Now Available at Mouser for Accurate CO2 Level Measurement
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- Advanced gameplay, Harting takes your PCB board connection to a new level!
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- Design of high-precision multi-channel time-to-digital converter based on FPGA
- Design ideas for lithium battery protection circuit and charging scheme
- The problem of not being able to find playback data occurs when configuring openpilot on Ubuntu
- Discussion: When 5G encounters an earthquake
- [GD32E231 DIY Contest] 06. Automatic fish feeding robot: ADC/DMA/TIMER0/serial port
- Are there any recommendations for 5V boost switching chips?
- Excellent materials, no points required: Album of previous competition questions of the National Undergraduate Electronic Design Competition
- How to determine the grounding method for a circuit board built with circuit modules?
- 50% charge in 5 minutes! How powerful is the much-hyped GaN fast charging?
- A comprehensive analysis of the principles of various lightning protection circuits