As big models inject new vitality into AI, the demand for edge AI is also increasing. Several major processor IP manufacturers are expanding edge-based AI NPUs to offload the CPU load, thereby improving efficiency and reducing power consumption.
Recently, Ceva announced the launch of Ceva-NeuPro-Nano NPU, expanding its Ceva-NeuPro Edge AI NPU product line.
TinyML (Tiny Machine Learning) is a technology for running machine learning models on resource-constrained microcontrollers and edge devices. The goal of TinyML is to implement efficient machine learning algorithms on low-power, low-memory, and low-computing resource devices to support real-time data processing and decision-making.
The increasing demand for efficient and professional AI solutions in IoT devices has driven the rapid growth of the TinyML market. According to a forecast by ABI Research, by 2030, more than 40% of TinyML shipments will use dedicated TinyML hardware rather than being driven by general-purpose MCUs.
What does TinyML entail?
TinyML has the following four characteristics:
Ultra-low power: Suitable for battery-powered or energy-harvesting devices, with power consumption typically in the milliwatt range.
Small memory footprint: Usually runs on microcontrollers with only a few KB to a few hundred KB of RAM and Flash.
Real-time processing: Supports real-time data processing and response, suitable for Internet of Things (IoT) devices.
Embedded applications: widely used in smart home, wearable devices, industrial Internet of Things and other fields.
The above four characteristics are also challenges for TinyML, including resource constraints, power consumption management, real-time requirements, and model accuracy.
In order to execute AI in edge MCU, more efficient algorithm design is needed, and the model must be compressed and quantized.
For example, design lightweight algorithms suitable for embedded systems, such as small convolutional neural networks (CNNs), recurrent neural networks (RNNs), etc. And use simple activation functions and efficient arithmetic operations to reduce computational overhead.
As for model compression and quantization, a variety of solutions can be used, including model pruning: removing unimportant parameters in the model to reduce computing and storage requirements; weight sharing: reducing the number of model parameters through weight sharing; quantization: converting floating-point numbers to fixed-point numbers, such as 8-bit or 16-bit integers, to reduce memory and computing requirements.
Currently, there are several ways to process TinyML. One is to accelerate it using a low-power computing unit on the MCU, such as Arm's Cortex-M. Another way is to offload the CPU load through a dedicated hardware accelerator. Several major IP suppliers have launched dedicated TinyML accelerator IPs. At the same time, some MCU manufacturers have also developed their own NPU/DSP or other similar accelerators.
The typical development process includes early data collection and preprocessing, followed by model training and optimization in a high-performance computing system, and then using model compression and quantization techniques to convert the final pruned model into a format that can be used by embedded processors before deployment.
Current mainstream TinyML IP suppliers
In fact, several major CPU IP suppliers provide NPU IP, including Arm, Cadence, Synopsys, Verisilicon, Ceva, etc. Some are suitable for MCUs, while others are suitable for SoCs, based on constraints such as processing performance, power consumption, area, and cost.
Arm’s Ethos
The Arm Ethos-U65 is an advanced micro neural processing unit (microNPU) designed for AI solutions in embedded devices. It inherits the high energy efficiency of the Ethos-U55 and doubles the performance for Arm Cortex-A, Cortex-R and Neoverse systems.
Key features of the Ethos-U65 include:
Excellent performance and energy efficiency: Achieve 1 TOP/s in 16nm process while achieving 2x performance improvement in the smallest area.
Flexible integration: Supports a wide range of operating systems and DRAM, and is suitable for BareMetal or RTOS systems on Cortex-M.
Support for complex AI models: Handle complex workloads, especially those that require extensive AXI interfaces and DRAM support, with performance improvements of up to 150%.
Energy efficient: ML workloads consume up to 90% less energy than previous Cortex-M generations.
Future-proof: Supports heavy computational operators such as convolution, LSTM, RNN, and automatically runs other cores on Cortex-M.
Offline Optimization: Improve performance and reduce system memory requirements by up to 90% by compiling and optimizing neural networks offline.
The multiple functions of Ethos-U65 enable it to meet the needs of various high-performance and low-power embedded devices, such as smart cameras, environmental sensors, industrial automation, and mobile devices. It provides a unified tool chain for developing, deploying, and debugging AI applications, which is a powerful boost to innovation.
Verisilicon
VeriSilicon's Vivante VIP9000 processor family provides programmable and scalable solutions for real-time and low-power AI devices. Its patented neural network engine and tensor processing structure provide excellent inference performance while having industry-leading power consumption and area efficiency. The VIP9000 series supports a performance range from 0.5TOPS to 20TOPS and is suitable for a wide range of applications from wearable devices, IoT, smart homes to automobiles and edge servers.
VIP9000 supports all popular deep learning frameworks and achieves acceleration through technologies such as quantization, pruning, and model compression. Its programmable engine and tensor processing structure support a variety of data types and processing tasks. Through ACUITY Tools SDK and various runtime frameworks, AI applications can be easily ported to the VIP9000 platform for efficient development and deployment.
Something
Ceva-NeuPro-Nano is a highly efficient, self-contained edge NPU designed for TinyML applications for AIoT devices. Its performance ranges from 10 GOPS to 200 GOPS, supporting always-on applications for battery-powered devices such as hearables, wearables, home audio, smart home, and smart factory. It can run independently without the need for a main CPU/DSP, including code execution and memory management. It supports 4, 8, 16, and 32-bit data types, with native Transformer calculations, sparsity acceleration, and fast quantization. With Ceva-NetSqueeze technology, memory usage is reduced by 80%. Ceva NeuPro-Studio AI SDK is provided, which works seamlessly with open source AI inference frameworks such as TFLM and µTVM, covering voice, vision, and sensing use cases. Two configurations, Ceva-NPN32 and Ceva-NPN64, meet a wide range of application needs, providing optimal power efficiency and small silicon area.
Cadence
Cadence's Tensilica Neo NPU is a high-performance, low-power neural processing unit (NPU) designed for embedded AI applications. It provides high-performance AI processing capabilities for a variety of applications from sensors, audio, voice/speech recognition, vision, radar, etc. The Neo NPU is highly scalable, with single-core performance ranging from 256 to 32k 8x8-bit MAC per cycle, up to 80 TOPS, and can be further improved through multi-core configuration to meet the needs of ultra-low power IoT devices to high-performance AR/VR and automotive systems.
Neo NPU supports data types such as Int4, Int8, Int16, and FP16, and has mixed-precision computing capabilities, optimizing the balance between performance and accuracy. Its architecture supports a variety of neural network topologies, including classic and generative AI networks, and can offload the burden of the main processor. The built-in compression/decompression function effectively reduces system memory usage and bandwidth consumption.
Neo NPU can run at a typical clock frequency of up to 1.25GHz, providing excellent computing performance in a 7nm process. It integrates Cadence's NeuroWeave SDK, supports a unified software development environment, simplifies model deployment and optimization processes, and provides efficient and flexible AI solutions to meet the needs of a variety of embedded AI applications.
Synopsys
The Synopsys ARC NPX6 NPU IP family is the industry's highest-performance neural processing unit (NPU) IP, designed to meet the real-time computing needs of AI applications with ultra-low power consumption. The family includes ARC NPX6 and NPX6FS, supports the latest complex neural network models, including generative AI, and provides up to 3,500 TOPS of performance for intelligent SoC designs.
A single instance of the ARC NPX6 NPU IP can provide up to 250 TOPS of performance in a 5nm process, which can be increased to 440 TOPS through sparse features. After integrating multiple NPU instances, the performance can reach 3,500 TOPS. ARC NPX6 supports from 1K to 96K MACs and is compatible with CNN, RNN/LSTM and emerging networks such as Transformer. It supports INT 4/8/16-bit resolution, and optional BF16 and FP16.
The ARC NPX6FS NPU IP is designed for functional safety and meets ISO 26262 ASIL D standards for automotive and other safety-critical applications. It features dual-core lockstep processors and self-checking safety monitoring to meet mixed criticality and virtualization requirements.
The ARC MetaWare MX Development Toolkit provided by Synopsys includes a compiler, debugger, neural network software development kit (SDK), virtual platform SDK, runtime and library, and advanced simulation models. The toolkit can automatically divide the algorithm into MAC resources for efficient processing and simplify the development process.
Previous article:ST to showcase three products that enhance the human experience at MWC Shanghai 2024
Next article:ST Edge AI Suite artificial intelligence development kit is officially launched
Recommended ReadingLatest update time:2024-11-16 09:50
- Popular Resources
- Popular amplifiers
- Comparative study on eye movement signal classification effect based on STM32CubeMX AI and NanoEdge AI
- Practical Deep Learning for Cloud, Mobile, and Edge Smart Devices
- AI accelerator architecture design and implementation. (Zhen Jianyong)
- Diffusion Model from Principle to Practice (Li Xinwei, Su Busheng, Xu Haoran, Yu Haiming)
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- New real-time microcontroller system from Texas Instruments enables smarter processing in automotive and industrial applications
- LLC resonant converter classic literature or books, personal test feedback!
- How to use current transformer to measure three-phase electricity?
- When AD is routed, are there grid lines? Are there solid lines?
- Overview of Short Range Wireless Communication Technology
- [HPM-DIY] openmv for hpm6750 repository open source
- Embedded development system
- I heard that RISC-V is a big hit now?
- Can someone help me convert the PCB format (allegro->altium)?
- Simulation and MCU source code of sawtooth wave generated by DAC0832
- Hardware and software design of wireless driving recorder based on MCU