If today's design teams use the traditional RTL design process, it will take a lot of time to bring computationally intensive networks into hardware. This field urgently needs a method that is different from the previous RTL process and can effectively improve productivity.
The time has come for the Catapult HLS platform
Fifteen years ago, Mentor recognized the need for design and verification teams to move from the RTL to the HLS level and developed the Catapult® HLS platform, which provides a complete flow from C++ to optimized RTL (Figure 1).
Figure 1: Catapult HLS Platform
The Catapult HLS platform provides algorithm designers with a hardware design solution that can generate high-quality RTL from C++/SystemC descriptions and target ASIC, FPGA or eFPGA. This platform can check errors in the design before synthesis, provide a seamless and reusable test environment for functional verification and coverage analysis, and support formal equivalence checks between the generated RTL and the original HLS source.
Benefits of this solution include:
Supports late-stage changes. You can change the C++ algorithm at any time, regenerate RTL code, or use a new process.
Supports hardware evaluation. Power, performance, and area options can be quickly explored without changing the original code.
Accelerate schedules. Reduce design and verification time from one year to a few months, add new features in days, and use 5 times fewer lines of C/C++ code than RTL.
AI Accelerator Ecosystem
At the same time, Mentor has deployed an AI accelerator ecosystem in the Catapult HLS platform (Figure 2), providing AI designers with an environment that allows them to quickly launch projects.
Figure 2: Catapult AI Accelerator Ecosystem
AC MATH Database
All functions in Algorithmic C Math (AC Math) are written as C++ template parameters, allowing designers to specify the accuracy of the values based on the target application. Many functions use different approximation strategies, for example, the natural logarithm is available in two forms, piecewise linear approximation and cordic form. The former is smaller and faster when a small error in accuracy is acceptable; the latter is slower but much more accurate. In all cases, the source can be customized to achieve the design goals. Each function/memory block is accompanied by detailed design files and C++ verifiers. Since the Catapult HLS platform utilizes the C++ verifier, it is easy to verify the RTL accuracy based on the source design.
The categories of mathematical functions in this database include:
Piecewise linear functions - absolute value, normalization, reciprocal, logarithms and exponentials (natural and base 2), square root, inverse square root, and sine/cosine/tangent (both positive and negative)
Activation functions, such as hyperbolic tangent, S-function, and Leaky ReLU function
Linear algebra functions such as matrix multiplication and Cholesky decomposition
DSP Database
The Algorithmic C DSP (AC DSP) library defines synthesizable C++ functions commonly required by DSP designers, such as filters and FFTs. These functions are designed as C++ classes, allowing designers to easily instantiate many variations of objects to create complex DSP subsystems. As with the AC Math library, input and output parameters are parameterized so that arithmetic can be performed at the desired fixed-point precision, providing a high degree of flexibility in performing area and performance tradeoffs for synthesized hardware.
The DSP database contains:
Filter functions such as FIR, 1-D moving average, and polyphase decimation
Fast Fourier transform (FFT) functions, such as radix-22 single delay feedback, radix-2x dynamic in-place, and radix-2 in-place image processing database Algorithmic C Image Processing Library (AC IPL) first defines some common pixel format type definitions.
The AI accelerator ecosystem also provides a rich tool set consisting of real and tested accelerator reference design examples that teams can study, modify and copy to quickly start projects. These kits provided with Catapult include configurable C++/SystemC IP source code, documentation, verification procedures and instruction codes to enable designs to undergo HLS synthesis and verification processes. These tool sets demonstrate various methods and programming techniques that can be used to experiment with trade-offs in performance (latency), frame rate, area or power.
PIXEL-PIPE Video Processing Toolset
The video processing toolkit demonstrates a real-time image processing application using the pixel-pipe accelerator (Figure 3). The accelerator block is implemented using a C++ class hierarchy. The block scales down the image, converts the image from color to monochrome, performs edge detection, and then scales up the image. A user-space application is executed on the CPU under Xilinx® PetaLinux, which allows software control to turn the edge detection block on or off. The toolkit documentation shows how to integrate the block into a Xilinx board using Xilinx IP so that the team can demonstrate the system.
Figure 3: Pixel-pipe video processing toolset
2-D Convolution Toolset
The toolkit shows how to code the Eyeriss1 processing element (PE) array in C++ to implement 2-D convolutions to perform image enhancements (sharpening, blurring, and edge detection). The processing elements (Figure 4) can perform 3x1 multiply-accumulates (convolutions).
Figure 4: Eyeriss processing element
TINYYOLO Object Recognition Tool Kit
The Object Recognition Toolkit (Figure 5) demonstrates an object recognition application using a convolution accelerator engine implemented using a PE array in the 2-D Eyeriss toolkit. The toolkit demonstrates how to achieve high-speed data routing through the AXI4 interconnect (reading core weight data from system memory) and how to define a high-performance memory architecture. The toolkit provides TensorFlow integration capabilities, which can perform inference testing at the network level in C++.
Figure 5: TINYYOLO toolkit example - system view
System Integration
Accelerator memory blocks do not exist independently, and Catapult HLS provides "interface synthesis" capabilities to add temporal protocols to non-temporal C++ function interface variables. Designers only need to set architectural constraints for the protocol in the Catapult GUI. This tool supports typical protocols such as AXI4 video streaming, request/acknowledge handshakes, and memory interfaces. This allows designers to explore interface protocols without changing C++ source code.
AXI Example
The AXI Example (Figure 6) shows how to instantiate one or more accelerator elements in an AXI SoC subsystem using the AXI interface IP generated by Catapult HLS. Master, slave, and streaming examples are provided.
Figure 6: AXI example
Basic Processor Example
The basic processor example (Figure 7) shows how to connect a machine learning accelerator to a complete processor-based system and uses an AXI example. The machine learning accelerator in this example uses a simple multiply/accumulate architecture with 2-D convolution and area maximum. Several third-party processor IP models are supported and a software flow (with associated data) is included for bare metal programming.
Figure 7: Example of a basic processor platform
Previous article:285,000 CPU cores + 10,000 GPUs, Microsoft launches world's top five supercomputer
Next article:NXP launches new wireless MCU products to achieve higher bandwidth and lower latency
- Popular Resources
- Popular amplifiers
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- ASML predicts that its revenue in 2030 will exceed 457 billion yuan! Gross profit margin 56-60%
- Detailed explanation of intelligent car body perception system
- How to solve the problem that the servo drive is not enabled
- Why does the servo drive not power on?
- What point should I connect to when the servo is turned on?
- How to turn on the internal enable of Panasonic servo drive?
- What is the rigidity setting of Panasonic servo drive?
- How to change the inertia ratio of Panasonic servo drive
- What is the inertia ratio of the servo motor?
- Is it better for the motor to have a large or small moment of inertia?
- Please help me with the question of setting up the opencl environment
- Common solutions for J-Link failing to connect to the chip
- Arteli AT32F4xx Standard Peripheral Library Release Notes
- AD 3D component package library
- "Vivado Starts Here" - Quickly Get Started with Vivado
- If the PCB heat dissipation area is not enough, will increasing the copper thickness help improve the heat dissipation of the PCB?
- Can't tell the difference? The difference between RF analog signal source and vector signal source
- Most EMI problems are related to clock signals.
- [Xiao Meige SoC] How to view the number of GPIO added on the FPGA side of the SoC FPGA system and use interrupts
- What are the two parts of RFID?