Neural network architecture - making AI truly alive-EEWORLD

Collect

Translated from - Semiwiki, Bernard Murphy

Custom AI acceleration is on the rise. In cloud computing, Alibaba follows Amazon and Google in launching its own custom accelerator. Facebook is also in the game, and Microsoft has a large stake in Graphcore. Intel and Mobileye have strong edge AI in the automotive space, while wireless infrastructure developers are adding AI capabilities to small cells and base stations for 5G. All of these applications rely on a lot of flexibility and future-proofing to achieve long-term relevance in a rapidly evolving environment.

AI Traditional Hardware Solutions

But there are many applications for which power, cost, or a transparent usage model is a more important metric. An agricultural monitor in a remote location, a microwave voice controller, traffic sensors distributed across a large city. For these problems, a general solution, or even a general AI solution, may be overkill. So, an application-specific AI capability will be more compelling.

Before the AI era, you would immediately think of a hardware accelerator - it can do whatever it has to do, but much faster than running a piece of software on a CPU. This is what an AI accelerator does. It may still be software-driven, but in a different way than a general-purpose CPU. The software is developed in Python on a large platform (such as TensorFlow or Torch), and then compiled to the target accelerator in multiple steps.

That’s where the magic happens. As long as the accelerator stays within the general confines of a neural network architecture, it can be as wild as you want it to be. It can support multiple convolution engines, each backed by SRAM as a whole, along with local memory to optimize access to prioritized operations.

It might support common operations like specialized function pooling. To improve speed and performance, it will often support different word widths at different stages of inference and support specialized optimizations when dealing with sparse arrays. Both of these areas are hot areas for innovation in neural network architectures, with some architects even experimenting with single-ratio privileged values - if a weight can only be 1 or 0, then you don't need multiplications in convolutions and sparsity increases!

The challenge with all of this is that when you want to commit to a final architecture, you find that there are so many knobs that it’s hard to know where to start, or if you’ve really explored the full space of possibilities. To complicate matters further, you need to test and characterize on a wide range of large test cases (large images, speech samples, etc.).

It is common sense to run most of your testing in C rather than RTL, as it runs orders of magnitude faster and is easier to tune than RTL. In addition, neural network algorithms map well to high-level synthesis (HLS), so your C model can be more than just a model, it can also generate RTL. You can explore the power, performance, and area implications of the choices you are considering - multiple convolution processors, local memory, word width, broadcast updates. All with fast turnaround time, allowing you to more fully explore the range of possible optimizations.

Reference address：Neural network architecture - making AI truly alive

Previous article：Inventory of AI chip companies for edge and terminal applications
Next article：Interview with Fabio Violante, CEO of Open Source Electronics Prototyping Platform

Popular Resources
Popular amplifiers