Translated from - Semiwiki, Bernard Murphy
Custom AI acceleration is on the rise. In cloud computing, Alibaba follows Amazon and Google in launching its own custom accelerator. Facebook is also in the game, and Microsoft has a large stake in Graphcore. Intel and Mobileye have strong edge AI in the automotive space, while wireless infrastructure developers are adding AI capabilities to small cells and base stations for 5G. All of these applications rely on a lot of flexibility and future-proofing to achieve long-term relevance in a rapidly evolving environment.
But there are many applications for which power, cost, or a transparent usage model is a more important metric. An agricultural monitor in a remote location, a microwave voice controller, traffic sensors distributed across a large city. For these problems, a general solution, or even a general AI solution, may be overkill. So, an application-specific AI capability will be more compelling.
Before the AI era, you would immediately think of a hardware accelerator - it can do whatever it has to do, but much faster than running a piece of software on a CPU. This is what an AI accelerator does. It may still be software-driven, but in a different way than a general-purpose CPU. The software is developed in Python on a large platform (such as TensorFlow or Torch), and then compiled to the target accelerator in multiple steps.
That’s where the magic happens. As long as the accelerator stays within the general confines of a neural network architecture, it can be as wild as you want it to be. It can support multiple convolution engines, each backed by SRAM as a whole, along with local memory to optimize access to prioritized operations.
It might support common operations like specialized function pooling. To improve speed and performance, it will often support different word widths at different stages of inference and support specialized optimizations when dealing with sparse arrays. Both of these areas are hot areas for innovation in neural network architectures, with some architects even experimenting with single-ratio privileged values - if a weight can only be 1 or 0, then you don't need multiplications in convolutions and sparsity increases!
The challenge with all of this is that when you want to commit to a final architecture, you find that there are so many knobs that it’s hard to know where to start, or if you’ve really explored the full space of possibilities. To complicate matters further, you need to test and characterize on a wide range of large test cases (large images, speech samples, etc.).
It is common sense to run most of your testing in C rather than RTL, as it runs orders of magnitude faster and is easier to tune than RTL. In addition, neural network algorithms map well to high-level synthesis (HLS), so your C model can be more than just a model, it can also generate RTL. You can explore the power, performance, and area implications of the choices you are considering - multiple convolution processors, local memory, word width, broadcast updates. All with fast turnaround time, allowing you to more fully explore the range of possible optimizations.
Previous article:Inventory of AI chip companies for edge and terminal applications
Next article:Interview with Fabio Violante, CEO of Open Source Electronics Prototyping Platform
- Popular Resources
- Popular amplifiers
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Download from the Internet--ARM Getting Started Notes
- Learn ARM development(22)
- Learn ARM development(21)
- Learn ARM development(20)
- Learn ARM development(19)
- Learn ARM development(14)
- Learn ARM development(15)
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- [GD32E503 Review] 07 Review of SDIO Bus Driver SD Card (Part 2)
- Recommend a data acquisition card based on Ethernet bus NET-2411
- Why is it sufficient to represent a scalar field with F(X,Y,Z), while a vector field requires three functions: P(X,Y,Z) Q(X,Y,Z) R(X,Y,Z)
- CC2640R2F Bluetooth debugging - change device name
- Selection of external memory for DSP
- How can I simulate the signal receiving process of the 125KHZ card reader/writer circuit?
- STM32 MLX90614 infrared temperature sensor engineering code
- Ultrasonic transducer impedance matching problem
- CC2640R2F BLE development service characteristic attribute explanation
- What is the difference between UWB, Bluetooth, Wi-Fi and RFID?