Actions Technology's Zhou Zhengyu: Actions Intelligence: The Future of AI Audio Chips on the Edge

Latest update time：2024-11-08

Reads：

ChatGPT has stimulated people's curiosity and opened people's imagination. With the unprecedented widespread adoption of generative AI, the demand for AI computing power has surged. Similar to the development path of traditional computing, in order to popularize AI and tap its full potential, AI computing must be reasonably distributed between cloud servers and end-side devices (such as PCs, mobile phones, cars, IoT devices), rather than letting the cloud carry all the AI load. This architecture of cloud and end-side AI working together is called hybrid AI, which will provide more powerful, more effective and more optimized AI. In other words, to make AI truly accessible and in-depth in various scenarios in daily life, it is inseparable from the implementation of end-side AI.

Edge AI brings machine learning to every IoT device, reducing reliance on cloud computing power. It can provide low-latency AI experience in the absence of network connection or network congestion, and has significant advantages such as low power consumption, high data privacy and personalization. One of the most important carriers of AIoT is battery-driven ultra-low power small IoT devices, which are huge in number and rich in applications. In the wave of the new generation of AI, edge AI is the key to realizing ubiquitous artificial intelligence, and enabling AI for battery-driven low-power IoT devices is the key to making edge AI a reality.

On November 5, 2024, Dr. Zhengyu Zhou, Chairman and CEO of Actions Intelligence Co., Ltd., was invited to attend the Aspencore 2024 Global CEO Summit. Combining the craze of the AI era and the new generation of AI trends brought about by edge AI, he shared Actions Intelligence's innovative technologies and blockbuster products in low-power edge AI audio and delivered a keynote speech: "Actions Intelligence: The Future of Edge AI Audio Chips".

Dr. Zhou Zhengyu said: In the wide range of applications from edge AI to generative AI, different AI applications have significantly different requirements for computing resources, and many edge AI applications are specialized applications that do not require large models and large computing power. This is especially true in the AIoT field represented by voice interaction, audio processing, predictive maintenance, and health monitoring.

Actions Technology aims to achieve energy-efficient AI computing power on battery-powered small and medium-sized model machine learning IoT devices

In battery-powered IoT devices such as portable products and wearable products, Actions Technology is committed to achieving TOPS-level AI computing power at milliwatt-level power consumption to meet the low power consumption and high energy efficiency requirements of IoT devices. Taking wearable products (earphones and watches) as an example, the average power consumption is between 10mW-30mW and the storage space is less than 10MB , which defines the resource budget for low-power end-side AI, especially wearable devices.

Dr. Zhou Zhengyu pointed out that "Actions Intelligence" is a strategy proposed for the implementation of battery-driven edge AI. It will focus on battery-driven low-power audio edge AI applications with a model size of less than 10 million parameters (10M), and is committed to creating a general AI computing power of 0.1-1TOPS at a power consumption of 10mW-100mW for low-power AIoT devices. In other words, "Actions Intelligence" will challenge the target AI computing power energy efficiency ratio of 10TOPS/W-100TOPS/W . According to ABI Research's forecast, the edge AI market is growing rapidly. It is expected that by 2028, the number of edge AI devices based on small and medium-sized models will reach 4 billion, with a compound annual growth rate of 32%. By 2030, it is expected that 75% of such AIoT devices will use dedicated hardware with high energy efficiency ratio.

Although the existing general-purpose CPU and DSP solutions have very good algorithm flexibility, their computing power and energy efficiency are far from reaching the above goals. According to the public information of ARM and Cadence, using the same 28/22nm process, the ARM A7 CPU can obtain a theoretical computing power of 0.01TOPS when running at a frequency of 1.2GHz, which requires a power consumption of 100mW, that is, the energy efficiency ratio under ideal conditions is only 0.1TOPS/W; HiFi4 DSP can obtain a theoretical computing power of 0.01TOPS when running at 600MHz, which requires a power consumption of 40mW, that is, the energy efficiency ratio under ideal conditions is 0.25TOPS/W. Even if the IP ARM Zhouyi of the dedicated neural network accelerator (NPU) has greatly improved its energy efficiency, it is only 2TOPS/W.

The fundamental reason why the above traditional technologies have relatively poor energy efficiency is due to the traditional von Neumann computing structure. The traditional von Neumann computing system adopts an architecture that separates storage and computing, and has bottlenecks of "storage wall" and "power consumption wall", which seriously restricts the improvement of system computing power and energy efficiency.

In the von Neumann architecture, the computing unit must first read data from the memory, and then store it back to the memory after the calculation is completed. With the development of the semiconductor industry and the differences in demand, the processor and memory have taken different process routes. Due to differences in process, packaging, and demand, the memory data access speed cannot keep up with the processor's data processing speed. Data transmission is like being in a huge funnel. No matter how much the processor pours in, the memory can only "flow slowly." The two major problems of narrow data exchange channels between the two and the resulting high energy consumption have built a "storage wall" between storage and computing.

In addition, under the traditional architecture, the power consumption required to transfer data from the memory unit to the computing unit is many times that of the computing itself. Therefore, the energy consumption and time actually used for computing are very low. The frequent migration of data between the memory and the processor brings serious transmission power consumption problems, which is called the "power wall".

SRAM-based in-memory computing is currently the best solution for low-power edge AI

Dr. Zhou Zhengyu said: The way to weaken or eliminate the "storage wall" and "power consumption wall" problems is to adopt the Computing-in-Memory (CIM) structure. The core idea is to move part or all of the calculations to the storage, so that the storage unit has computing power. Data does not need a separate computing component to complete the calculation, but is stored and calculated in the storage unit, eliminating data access latency and power consumption. It is a true fusion of storage and computing. At the same time, because the calculation is completely dependent on the storage, it is possible to develop finer-grained parallelism, greatly improving performance, especially energy efficiency.

The algorithmic basis of machine learning is a large number of matrix operations, which are suitable for distributed parallel processing operations, and in-memory computing is very suitable for artificial intelligence applications.

To do calculations on storage, the choice of storage media is the key to cost. Single chip is king, and ActionCore's goal is to integrate the computing power of low-power edge AI and other SoC modules into one chip, so DDR RAM and Flash using special processes are not within the scope of consideration. SRAM and emerging NVRAM (such as RRAM or MRAM) in the CMOS process applicable to standard SoCs have come into view. The SRAM process is very mature and can be upgraded synchronously with the upgrade of advanced processes. It has fast read and write speeds, high energy efficiency, and can be read and written an unlimited number of times. The only drawback is the low storage density, but for the computing power requirements of most edge AI, this defect will not become a hindrance. In the short term , SRAM is the best technical path to create high energy efficiency on low-power edge AI devices, and it can be quickly implemented without mass production risks.

In the long run, emerging NVRAMs such as RRAM can also be integrated into SoCs due to their higher density than SRAM and lower read power consumption, providing room for imagination for in-memory computing architectures. However, the RRAM process is not yet mature, and large-scale mass production still has certain risks. The most advanced process can only reach 22nm, and there is a fatal flaw of limited write times (exceeding it will cause permanent damage). Therefore, Dr. Zhou Zhengyu expects that in the future, when RRAM technology matures, the hybrid technology of SRAM and RRAM will have the opportunity to become the best technical path. AI calculations that need to be written frequently can be implemented based on SRAM CIM, and AI calculations that are not frequently or limited in number of writes can be implemented by RRAM CIM. Based on this hybrid technology, it is expected to achieve greater computing power and higher energy efficiency.

Actions Technology innovatively uses analog-digital hybrid design to achieve SRAM-based computing in memory (CIM)

There are two mainstream implementation methods for SRAM-based CIM circuits disclosed in the industry. One is to use digital circuits to implement computing functions as close to the SRAM as possible. Since the computing unit does not actually enter the SRAM array, this can only be regarded as a near-storage technology in essence. Another idea is to use the characteristics of some analog devices in the SRAM medium for analog computing. Although this technical path realizes the real CIM, its shortcomings are also obvious. On the one hand, the accuracy of analog computing is lost, and consistency and mass production cannot be guaranteed at all. The same chip cannot ensure the same output results at different times and in different environments. On the other hand, it must be based on ADC and DAC to complete the information interaction between CIM based on analog computing and other digital modules. The overall data flow arrangement and interface interaction design are restricted, and it is not easy to improve the operating efficiency.

Actions Technology innovatively uses a circuit based on a hybrid analog-digital design to implement CIM, using a customized analog design to implement a digital computing circuit within the SRAM medium, which not only realizes true CIM but also ensures computing accuracy and mass production consistency.

Dr. Zhou Zhengyu believes that Actions Technology's choice of the Mixed-Mode SRAM based CIM (MMSCIM) technology path has the following significant advantages:

First, the energy efficiency ratio is higher than that of pure digital implementation and is almost equivalent to that of pure analog implementation;

Second, there is no need for ADC/DAC. Digitalization can achieve high accuracy, high reliability and mass production consistency, which are the inherent advantages of digitalization.

Third, it is easy to upgrade the process and convert the design between different FABs;

Fourth, it is easy to increase the speed and optimize the performance/power/area (PPA);

Fifth, adaptive sparse matrix further saves power consumption and improves energy efficiency.

For high-quality audio processing and voice applications, MMSCIM is the best future low-power end-side AI audio technology architecture. By reducing the need for data transmission between memory and storage, it can significantly reduce latency, significantly improve performance, and effectively reduce power consumption and heat generation. For those who want to enable AI on battery-powered IoT devices that pursue extreme energy efficiency and create as much AI computing power as possible per milliwatt, the MMSCIM technology used by Actions Technology is the best solution to truly implement end-side AI.

Dr. Zhou Zhengyu announced the MMSCIM roadmap of Actions Technology for the first time. The roadmap shows:

1. The first generation (GEN1) MMSCIM of ActionCore has been launched in 2024. GEN1 MMSCIM adopts 22nm process. Each core can provide 100 GOPS computing power and the energy efficiency ratio is as high as 6.4 TOPS/W @INT8;

2. By 2025, Actions Technology will launch the second generation (GEN2) MMSCIM. GEN2 MMSCIM adopts 22nm process and its performance will be three times higher than the first generation. Each core provides 300GOPS computing power, directly supports Transformer model, and the energy efficiency ratio is also improved to 7.8TOPS/W @INT8;

3. By 2026, the third-generation (GEN3) MMSCIM with a new 12nm process will be launched. Each core of GEN3 MMSCIM will achieve a high computing power of 1 TOPS, support Transformer, and further improve the energy efficiency to 15.6TOPS/W @INT8.

Each generation of MMSCIM technology can improve the total computing power by stacking multiple cores . For example, the single-core computing power of MMSCIM GEN2 is 300 GOPS, and the computing power can reach more than 1TOPS by combining four cores.

Actions Technology officially releases a new generation of MMSCIM-based end-side AI audio chip

Actions Technology has successfully implemented the first generation of MMSCIM, which achieved a computing power of 0.1TOPS at 500MHz and an energy efficiency ratio of 6.4TOPS/W. Benefiting from its adaptability to sparse matrices, if there is a model with reasonable sparsity (that is, when a certain proportion of parameters are zero), the energy efficiency ratio will be further improved, and the energy efficiency ratio can reach or even exceed 10TOPS/W depending on the degree of sparsity. Based on the innovation of this core technology, Actions Technology has created the next generation of low-power, high-computing, and high-energy-efficiency end-side AI audio chip platform.

Dr. Zhou Zhengyu, on behalf of Actions Technology, officially released a new generation of MMSCIM-based end-side AI audio chips, including three chip series:

The first series is ATS323X, which is aimed at the low-latency private wireless audio field;
The second series is ATS286X, which is aimed at the Bluetooth AI audio field;
The third series is ATS362X, which is aimed at the AI DSP field.

All three series of chips adopt the CPU (ARM) + DSP (HiFi5) + NPU (MMSCIM) three-core heterogeneous design architecture. Actions' R&D personnel integrated MMSCIM and advanced HiFi5 DSP to form Actions' "Actions Intelligence NPU (AI-NPU)" architecture, and through collaborative computing, formed an NPU architecture that is both highly flexible and energy-efficient. In this AI-NPU architecture, MMSCIM supports basic general AI operators and provides low power consumption and high computing power. At the same time, due to the continuous emergence of new AI models and new operators, emerging special operators not covered by MMSCIM are supplemented by HiFi5 DSP.

All the above series of edge AI chips can support AI models with less than 1 million parameters on chip, and can be expanded to support AI models with up to 8 million parameters through off-chip PSRAM. At the same time, Actions Technology has created a dedicated AI development tool "ANDT" for AI-NPU, which supports industry-standard AI development processes such as Tensorflow, HDF5, Pytorch and Onnx. At the same time, it can automatically split the given AI algorithm reasonably to CIM and HiFi5 DSP for execution. ANDT is an important weapon for building Actions' low-power edge audio AI ecosystem. With the help of Actions' ANDT tool chain, algorithm integration can be easily realized, helping developers to quickly complete product implementation.

According to the comparison of the energy efficiency ratio of the first generation MMSCIM and HiFi5 DSP published by Dr. Zhou Zhengyu:

When Actions Technology's GEN1 MMSCIM and HiFi5 DSP both run the same 717K parameter Convolutional Neural Network (CNN) network model at 500MHz for environmental noise reduction, MMSCIM can reduce power consumption by nearly 98% compared to HiFi5 DSP , and the energy efficiency ratio is improved by 44 times . When testing the use of a 935K parameter CNN network model for speech recognition, MMSCIM can reduce power consumption by 93% compared to HiFi5 DSP , and the energy efficiency ratio is improved by 14 times .

In addition, when testing the use of more complex network models for environmental noise reduction, the Deep Recurrent Neural Network model can reduce power consumption by 89% compared to the HiFi5 DSP; the Convolutional Recurrent Neural Network model can reduce power consumption by 88% compared to the HiFi5 DSP ; and the Convolutional Deep Recurrent Neural Network model can reduce power consumption by 76% compared to the HiFi5 DSP .

Finally, under the same conditions, when operating a certain CNN-Con2D operator model, the measured AI computing power of GEN1 MMSCIM is 16.1 times higher than that of HiFi5 DSP .

In summary, the latest generation of MMSCIM-based edge AI audio chips launched by Actions Technology will have a far-reaching impact on the industry and is expected to become a new trend in leading edge AI technology.

Actions Intelligence helps AI ecosystem develop rapidly

From ChatGPT to Sora, texts generate texts, texts generate images, texts generate videos, images generate texts, and videos generate texts. Various cloud-based big models continue to refresh people's expectations of AI. However, the road to AI development is still long. From cloud to end will be a new development trend, and the world of AI is about to enter the second half.

With advantages such as low latency, personalized services and data privacy protection, edge AI plays an increasingly important role in IoT devices, showing more possibilities in multiple industries such as manufacturing, automobiles, and consumer goods. Based on the SRAM analog-to-digital hybrid CIM technology path, the launch of Actions Technology's new products has taken the first step in creating low-power edge AI computing power, successfully integrating the AI acceleration engine into the product, and launching a CPU+DSP+NPU triple-core AI heterogeneous edge AI audio chip.

Finally, Dr. Zhou Zhengyu sincerely hopes that AI can be truly accessible everywhere through the "Actions Intelligence" strategy. In the future, Actions Technology will continue to increase its investment in edge computing power research and development for end-side devices, achieve further leaps in computing power and energy efficiency through technological innovation and product iteration, and provide end-side AIoT chip products with high energy efficiency, high integration, high performance and high security, promote the integrated application of AI technology on end-side devices, and help the healthy and rapid development of the end-side AI ecosystem.

-END-

▼ Review of past highlights ▼

It’s no use getting close to Nvidia, the myth of 40x AI bull stocks is completely shattered!

From a loss of 4 billion to a profit of 4 billion, Seres's "reversal" in two years

How to solve the fire problem in photovoltaic power stations under the wave of new energy?

Year-end fan rewards | Pick Tektronix high-quality probes

What impact will Trump’s victory have on my country’s semiconductor industry?

Latest articles about

■40 billion, a photovoltaic unicorn IPO emerged in Shenzhen

■With hundreds of millions of yuan invested in one month, smart driving has entered the computing power game

■Electron microscope, how can my country break the overseas monopoly?

■Chip semiconductor knowledge for beginners

■Mass production of photoresist is only the first step to a breakthrough

■113 winners! The results of the 19th "China Chip" Excellent Product Collection are released

■Guangzhou, the birth of a new semiconductor unicorn

■New Motor Product Line | Jihai launches GHD3440Rx advanced upgraded version of motor-specific gate driver

■It’s no use getting close to Nvidia, the myth of 40x AI bull stocks is completely shattered!

■From a loss of 4 billion to a profit of 4 billion, Seres's "reversal" in two years