Learn about the AI accelerator ecosystem-EEWORLD

Collect

If today's design teams use the traditional RTL design process, it will take a lot of time to bring computationally intensive networks into hardware. This field urgently needs a method that is different from the previous RTL process and can effectively improve productivity.

The time has come for the Catapult HLS platform

Fifteen years ago, Mentor recognized the need for design and verification teams to move from the RTL to the HLS level and developed the Catapult® HLS platform, which provides a complete flow from C++ to optimized RTL (Figure 1).

Figure 1: Catapult HLS Platform

The Catapult HLS platform provides algorithm designers with a hardware design solution that can generate high-quality RTL from C++/SystemC descriptions and target ASIC, FPGA or eFPGA. This platform can check errors in the design before synthesis, provide a seamless and reusable test environment for functional verification and coverage analysis, and support formal equivalence checks between the generated RTL and the original HLS source.

Benefits of this solution include:

Supports late-stage changes. You can change the C++ algorithm at any time, regenerate RTL code, or use a new process.

Supports hardware evaluation. Power, performance, and area options can be quickly explored without changing the original code.

Accelerate schedules. Reduce design and verification time from one year to a few months, add new features in days, and use 5 times fewer lines of C/C++ code than RTL.

AI Accelerator Ecosystem

At the same time, Mentor has deployed an AI accelerator ecosystem in the Catapult HLS platform (Figure 2), providing AI designers with an environment that allows them to quickly launch projects.

Figure 2: Catapult AI Accelerator Ecosystem

AC MATH Database

All functions in Algorithmic C Math (AC Math) are written as C++ template parameters, allowing designers to specify the accuracy of the values based on the target application. Many functions use different approximation strategies, for example, the natural logarithm is available in two forms, piecewise linear approximation and cordic form. The former is smaller and faster when a small error in accuracy is acceptable; the latter is slower but much more accurate. In all cases, the source can be customized to achieve the design goals. Each function/memory block is accompanied by detailed design files and C++ verifiers. Since the Catapult HLS platform utilizes the C++ verifier, it is easy to verify the RTL accuracy based on the source design.

The categories of mathematical functions in this database include:

Piecewise linear functions - absolute value, normalization, reciprocal, logarithms and exponentials (natural and base 2), square root, inverse square root, and sine/cosine/tangent (both positive and negative)

Activation functions, such as hyperbolic tangent, S-function, and Leaky ReLU function

Linear algebra functions such as matrix multiplication and Cholesky decomposition

DSP Database

The Algorithmic C DSP (AC DSP) library defines synthesizable C++ functions commonly required by DSP designers, such as filters and FFTs. These functions are designed as C++ classes, allowing designers to easily instantiate many variations of objects to create complex DSP subsystems. As with the AC Math library, input and output parameters are parameterized so that arithmetic can be performed at the desired fixed-point precision, providing a high degree of flexibility in performing area and performance tradeoffs for synthesized hardware.

The DSP database contains:

Filter functions such as FIR, 1-D moving average, and polyphase decimation

Fast Fourier transform (FFT) functions, such as radix-22 single delay feedback, radix-2x dynamic in-place, and radix-2 in-place image processing database Algorithmic C Image Processing Library (AC IPL) first defines some common pixel format type definitions.

The AI accelerator ecosystem also provides a rich tool set consisting of real and tested accelerator reference design examples that teams can study, modify and copy to quickly start projects. These kits provided with Catapult include configurable C++/SystemC IP source code, documentation, verification procedures and instruction codes to enable designs to undergo HLS synthesis and verification processes. These tool sets demonstrate various methods and programming techniques that can be used to experiment with trade-offs in performance (latency), frame rate, area or power.

PIXEL-PIPE Video Processing Toolset

The video processing toolkit demonstrates a real-time image processing application using the pixel-pipe accelerator (Figure 3). The accelerator block is implemented using a C++ class hierarchy. The block scales down the image, converts the image from color to monochrome, performs edge detection, and then scales up the image. A user-space application is executed on the CPU under Xilinx® PetaLinux, which allows software control to turn the edge detection block on or off. The toolkit documentation shows how to integrate the block into a Xilinx board using Xilinx IP so that the team can demonstrate the system.

C:Userswtkm7cAppDataLocalTemp1587349327(1).png

Figure 3: Pixel-pipe video processing toolset

2-D Convolution Toolset

The toolkit shows how to code the Eyeriss1 processing element (PE) array in C++ to implement 2-D convolutions to perform image enhancements (sharpening, blurring, and edge detection). The processing elements (Figure 4) can perform 3x1 multiply-accumulates (convolutions).

Figure 4: Eyeriss processing element

TINYYOLO Object Recognition Tool Kit

The Object Recognition Toolkit (Figure 5) demonstrates an object recognition application using a convolution accelerator engine implemented using a PE array in the 2-D Eyeriss toolkit. The toolkit demonstrates how to achieve high-speed data routing through the AXI4 interconnect (reading core weight data from system memory) and how to define a high-performance memory architecture. The toolkit provides TensorFlow integration capabilities, which can perform inference testing at the network level in C++.

Figure 5: TINYYOLO toolkit example - system view

System Integration

Accelerator memory blocks do not exist independently, and Catapult HLS provides "interface synthesis" capabilities to add temporal protocols to non-temporal C++ function interface variables. Designers only need to set architectural constraints for the protocol in the Catapult GUI. This tool supports typical protocols such as AXI4 video streaming, request/acknowledge handshakes, and memory interfaces. This allows designers to explore interface protocols without changing C++ source code.

AXI Example

The AXI Example (Figure 6) shows how to instantiate one or more accelerator elements in an AXI SoC subsystem using the AXI interface IP generated by Catapult HLS. Master, slave, and streaming examples are provided.

Figure 6: AXI example

Basic Processor Example

The basic processor example (Figure 7) shows how to connect a machine learning accelerator to a complete processor-based system and uses an AXI example. The machine learning accelerator in this example uses a simple multiply/accumulate architecture with 2-D convolution and area maximum. Several third-party processor IP models are supported and a software flow (with associated data) is included for bare metal programming.

Figure 7: Example of a basic processor platform

Keywords：CATAPULT Reference address：Learn about the AI accelerator ecosystem

Previous article：285,000 CPU cores + 10,000 GPUs, Microsoft launches world's top five supercomputer
Next article：NXP launches new wireless MCU products to achieve higher bandwidth and lower latency

Popular Resources
Popular amplifiers

Learn about the AI ​​accelerator ecosystem

Learn about the AI accelerator ecosystem