Ultra96 (v1) with Vitis: DPU Integration and MIPI Platform Tutorial

Latest update time：2020-06-10

Reads：

Ultra 96™ Introduction

The Ultra 96 is a great platform for building machine learning applications for edge use cases. The 96-board form factor and programmable logic of the Zynq® MPSoC ZU3 device provide the flexibility to add a common MIPI CSI2 RX standard interface for video input for such end applications, while also integrating the Xilinx Deep Learning Processing Unit (DPU) into the design to drive high-performance, low-power machine learning edge applications. Since many first-time adopters of the Xilinx Vitis™ unified software platform flow will start with an existing Vivavdo-based design, we will begin our tutorial by looking at how to convert a legacy design implemented in Vivado IPI to an accelerated Vitis target platform. The MIPI pipeline in this design leverages the common 96Boards form factor of the Ultra96, uses an OV5640 imaging sensor on a MIPI imaging mezzanine card, and uses the YUYV output type to directly input the frame buffer write DMA core from the MIPI RX IP and then to the PS DDR. Next, we will demonstrate the steps to update the PetaLinux project to include the necessary libraries and drivers to create a Vitis software platform capable of supporting hardware-accelerated workflows, including DPU-based machine learning applications. Once the hardware and software platform components are complete, we will use the Vitis Development Kit to combine them into a Vitis acceleration platform, which we can then use to build hardware-accelerated software applications. Finally, we will introduce the integration of the Xilinx Deep Learning Processing Unit (DPU) in machine learning acceleration applications. After adding the DPU, we can use the provided DPU runtime to evaluate a high-performance face detection application that uses streaming MIPI input from the generated platform.

01Design requirements

This section lists the hardware and software tools required to accelerate machine learning algorithms using the Xilinx Deep Learning Processing Unit (DPU) IP.

Xilinx Design Tools 2019.2
Ultra96 v1 Plate Files
Ultra96 Board (v1)
Ultra96 12V Power Supply
MicroUSB to USB-A cable
AES-ACC-USB-JTAG Board
A blank microSD card formatted with the FAT32 file system
Xilinx CSI2 RX MIPI IP License
D3 Engineering Designcore Camera Mezzanine Board OV5640 (MIPI Input Source)

Optional

DisplayPort Monitors
Mini DisplayPort cable for selected monitors
USB webcam

02Prepare your workspace

Copy the database to your local machine and download the reference files directory. After downloading the reference files, unzip them into the reference files directory of the database you just copied. The remaining folders in this hierarchy will be empty after the download is complete and will be populated throughout the tutorial.

03Generate MIPI basic project

First, we will create the original non-accelerated MIPI project in Vivado® and Petalinux tools. After completing this step, you will have a bootable hardware and software image that can be used to start the pipeline to view the input MIPI video from the Ultra96.

living

Copy the "scripts" folder from the [reference_files/base_project/vivado] directory to the top-level vivado directory
Launch Vivado and create a new project called "u96v1_mipi" in the top-level "vivado" directory, making sure the "Create project subdirectory" option is selected and click "Next"
Select Create an RTL Project and click Next
Select "Add File", select the "U96v1_mipi_wrapper.v" file from the [reference_files/base_project/vivado/sources] directory, and click "Next"
Select "Add File", select the "cnst.xdc" file from the [reference_files/base_project/vivado/sources] directory and click "Next"
Select the "Board" tab and choose the Ultra96 V1 board for the project, then select "Next" and "Finish" to create the project
In the "TCL Console" tab at the bottom of the screen, change directories to the top-level "vivado" directory
Use the TCL console to call the resource ./scripts/u96v1_mipi_static.tcl
Select Generate Block Design in the Flow Navigator window and allow this to complete
Build the project as a bitstream
Select "File > Export > Export Hardware" and export the hardware to the hw_platform top-level directory, including the bitstream

PetaLinux

Get Petalinux Tools
Change directory to the top-level directory and create a new petalinux project using the following command: petalinux-create –t project –n petalinux –sreference_files/base_project/bsps/u96v1_mipi_static.bsp
Change directory to the new directory
Run petalinux-config --get-hw-description ../hw_platform --silentconfig to import the generated hardware
Read the petalinux-config -c rootfs and petalinux-config -c kernel menus to see what customizations have been made to include MIPI on Ultra96
运行 petalinux-build to build the system
Run petalinux-package --boot --force --fsbl --pmufw --u-boot --fpga to create BOOT.bin
Copy BOOT.bin and image.ub from the [petalinux/images/linux] directory to the SD card and use it to boot the system
[Enter GStreamer commands here to test video input]

04Create Vitis hardware platform

We will now make the necessary additions and modifications to the hardware design to prepare it for software-defined acceleration: Open the Vivado base project to get started.

05Configure MPSoC Block

As we add additional components to the hardware design to meet acceleration requirements, the processing subsystem needs to be customized. Here, we will modify the configuration to create additional clocks, open additional interrupt ports, and create AXI master ports to add additional peripherals to the design.

Double-click the Zynq IP block to open the Processor Configuration Wizard (PCW)
Go to the Clock Configuration tab, select the Output Clocks tab, expand Low Power Domain Clocks > PL Fabric Clocks
Enable PL1 and change the requested frequency of both clocks to 100MHz, select IOPLL as the source clock
Go to the PS-PL Configuration tab, expand General > Schema Reset, and enable the second schema reset
Add another AXI Master Port, we will use it later to connect the interrupt controller - Click PS-PL Interfaces > Master Interfaces and enable AXI HPM0 LPD
Expand General > Interrupts > PS-PL
Enable IRQ1[0-7]
Click OK to exit PCW

06Configure platform interface

In order for the Vitis tool to insert the hardware acceleration block into the design, we need to leave it open and give it an interface that can be used to connect to the block. In this design, we need some memory mapped interfaces so that the DPU can connect to the PS DDR. On this platform we will open three HP slave ports, since there are three memory mapped master ports on the DPU block. In addition, this part of the flow allows us to "name" the port, giving it a shorter nickname to specify the connection later.

From the Window menu, select Platform Interface
In the Platform Interfaces tab, click Enable Platform Interfaces
Right-click the interface and select Enable (HP 0, 1, 2), add three unused PS HPx_FPD slaves
Also, enable the HPM0 master interface - make sure this is disabled on the Zynq PS block. The tools will use this master to connect to the accelerator
For each enabled slave interface, add a "sptag" value in the "Options" tab that will reference that port later in the flow: HP0, HP1, and HP2 respectively

07Specify the "Platform" Clock

Similar to how we specified the interface for the platform, we now have to indicate to the tool which clocks it should use for the accelerators in the platform. In this case, the DPU uses two clocks (1x and 2x clocks). Therefore, we will indicate to the platform 250MHz and 500MHz clocks. The DPU clock frequency can be faster or slower than this rate, and the rate is chosen to balance power consumption and frame rate performance in the application.

Right click on the block design, select "Add IP", then add the Clock Wizard IP
Change the instance name to clk_wiz_dynamic
Double-click the clk_wiz_dynamic IP and make the following changes in the Output Clocks tab: [clk_out1=250MHz], [clk_out2=500MHz], [Both Routing Match], [Reset Type = Active Low]
Move the original clock wizard (clk_wiz_static) from pl clk0 to pl clk1
In the Platform Interfaces tab, enable clk_out1 and clk_out2 of the clk_wiz_dynamic instance
Set the slower clock (clk_out1 in this case) to the default value
The id of clk_out1 should be set to 0, and the id of clk_out2 should be set to 1
Make sure that the proc_sys_reset block listed in each window is set to an instance connected to the clock

08 Separate the original components

In this design, we chose to place the original component (MIPI subsystem) on a separate clock from the PS. The accelerator's clock wizard and processor system reset are connected to the PL0 clock, and the MIPI subsystem is connected to the PL1 clock. This helps us ensure that any changes in the clock frequency (or clock gating) of the original or accelerated components will not affect the operation of the other components.

Right click on pl_clk0 and select "Disconnect Pin" in the menu
Connect pl_clk0 to clk_wiz_dynamic clk_in1 and pl_clk1 to clk_wiz_static clk_in1
Delete the network connected to pl_reset0
Right click on the block design, select "Add IP", and add the Processor System Reset IP for each new clock.
Name them proc_sys_reset_dynamic_1 and proc_sys_reset_dynamic_2
Connect the clk_out1 and clk_out2 outputs of the clk_wizard_dynamic block to the slowest_sync_clk inputs of proc_sys_reset_dynamic_1 and proc_sys_reset_dynamic_2 respectively
Connect PS pl_reset0 to the ext_reset_in input of the system reset blocks of both new processors
Connect pl_reset0 to the reset port of clk_wiz_dynamic
Connect pl_reset1 to the reset port of clk_wiz_static and the ext_reset_in pin of proc_sys_reset_200
Connect the clk_wiz_dynamic locked output to the dcm_locked input of the two new processor system reset blocks

09 Enable interrupt-based kernel

The default scheduling mode for the acceleration core is polling mode. To enable interrupt-based processing in the platform, we need to add an interrupt controller. In the current design, we need to connect the constant "gnd" to the interrupt controller without connecting any valid interrupt source at this time. Paired with the AXI interrupt controller is the "dynamic_postlink" tcl script in the Vivado source code, which will select the interrupt constant network, disconnect it from the connection block, and then automatically connect it after adding the acceleration core through the Vitis tool.

Right click on the block design, select "Add IP", add AXI Interrupt Controller
In the block properties of the interrupt controller, set the name to axi_intc_dynamic
Add a "connect" IP to connect the input to the interrupt controller
In the block properties of the "Connect" block, set the name to int_concat_dynamic
Double-click the "Connect" block and change the number of ports to 8
Added "Constant" IP to provide a constant "0" to the interrupt controller - this constant will be disconnected by the tool at compile time and replaced by the accelerated interrupt connection
Double-click the "Constant" IP and change the constant value to 0
Click the Run Connection Automation link in the Design Assistant bar and connect the slave AXI interface of the AXI interrupt controller - select HPM0_LPD since HPM1_FPD is used for the video subsystem.
Connect all inputs of the interrupt controller to the "Connect" block output
Connect the output of the "connect" block to the intr input of the interrupt controller
Connect the output of the interrupt controller to pl_ps_irq1 on the PS block
Select the output network of the Constant block and name it int_const_net

10Generative Design and XSA

Now that we have customized the design, we can export it to the Vitis tool via the Xilinx Support Archive (XSA). Please note: we will not be building this project into a bitstream. The Vitis tool will use this archive to import the design, synthesize it in the hardware accelerator, and then it will build the bitstream. Automating this part of the process using the dsa.tcl script - The naming and platform details are automated before the Xilinx Support Archive (XSA) file is exported to the hw_platform directory. In addition, this script links the previously mentioned dynamic_postlink.tcl script so that the scripts specific to this platform are included in the archive file.

Generate Block Design
Export the hardware platform by running source ./scripts/dsa.tcl

11. Create a software platform

The software platform requires some changes to the Petalinux project to add the necessary Xilinx Runtime (XRT) components to the design. At this point, there are two options: follow all of the steps below to copy the necessary files and enable these components in Petalinux, or skip 1-8 and replace the Petalinux project with the new project in u96v1_mipi_dynamic.bsp at [reference_files/platform_project/bsps/u96v1_mipi_dynamic.bsp].

12Add Xilinx runtime program

The first step in creating an acceleration platform is to add the library components mentioned above: Xilinx runtime and DPU runtime (dnndk). They come in the form of programs that we will add to the user layer in the Petalinux build. First, copy the files and build the programs, then enable them through the Petalinux root file system configuration menu.

Change directory to the Petalinux directory
Add the program to add the DPU utilities, libraries, and header files to the root file system cp -rp ../reference_files/platform_project/plnx/recipes-apps/dnndk project-spec/meta-user/recipes-apps
Add a procedure to add the Xilinx Runtime (XRT) driver cp -rp ../reference_files/platform_project/plnx/recipes-xrt project-spec/meta-user
Add a program creation plugin for adding an "autostart" script that runs automatically during Linux bootup cp -rp ../reference_files/platform_project/plnx/recipes-apps/autostart project-spec/meta-user/recipes-apps
Add the above program to the Petalinux image configuration

we

project-spec/meta-user/recipes-core/images/petalinux-image-full.bbap

pend

And add this to the end of the document:

IMAGE_INSTALL_append = " dnndk"
IMAGE_INSTALL_append = " autostart"
IMAGE_INSTALL_append = " opencl-headers"
IMAGE_INSTALL_append = " ocl-icd"
IMAGE_INSTALL_append = " xrt"
IMAGE_INSTALL_append = " xrt-dev"
IMAGE_INSTALL_append = " zocl"
Update the Petalinux project with the new XSA exported from Vivado

petalinux-config --get-hw-description=../hw_platform --silentconfig
Open the Petalinux root file system configuration GUI to enable the above program

petalinux-config -c rootfs

Then enable all the programs above in the "User Packages" submenu

13 Modify the Linux device tree

The Linux device tree needs to be modified in order to properly detect the Xilinx runtime kernel driver. Modify [project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi] to add the Zynq OpenCL node to the device tree.

At the bottom of project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi, add the following text:

&amba {
zyxclmm_drm: zyxclmm_drm@0xA0000000 {
reg = <0x0 0xA0000000 0x0 0x800000>;
compatible = "xlnx,zocl";
status = "okay";
interrupt-parent = <&axi_intc_dynamic>;
interrupts = <0 1>, <1 1>, <2 1>, <3 1>,
<4 1>, <5 1>, <6 1>, <7 1>;
};
};\

14Build Petalinux and package software components

At this point, we have made all the necessary configuration changes for the Petalinux build and can begin the build. This may take quite a while, but it may also be quite short given the processing power of your computer. Once the Linux build is complete, we need to move all of the built software components into a common directory. By having all of the boot components in one directory, it becomes easy when packaging the resulting platform on both the hardware and software side. Additionally, we will also use Petalinux to build a sysroot in order to provide a complete cross-compilation environment for this software platform. This sysroot will also be included in the software portion of the platform, as it will be needed to provide the correct versions of the header/include files when compiling for the platform.

Building Petalinux

petalinux-build\
Copy all .elf files from the [petalinux/images/linux] directory to [SW_Platform/boot], this should copy the following files:

o ARM Trusted Firmware - b131.elf

o PMU firmware – pmufw.elf

o U-Boot – u-boot.elf

o Zynq FSBL – zynqmp_fsbl.elf
Copy the image.ub file from the [petalinux/images/linux] directory to [SW_Platform/image]
Copy the linux.bif file from the [reference_files/platform_project/plnx] directory to [sw_platform/boot]
Build Yocto SDK from project petalinux-build --sdk (provides sysroot)
Move [images/linux/sdk.sh] to [SW_Platform/sysroot], then extract the SDK cd sw_platform/sysroot ./sdk.sh -d ./-y

15 Generate Vitis software platform

The Vitis software platform is a set of components that includes everything needed to boot and develop a specific board/board configuration and consists of both hardware and software components. Now that we have built the hardware (XSA) and software (Linux image and boot elf file) components for the platform, we can use these components to generate and export a user-defined custom platform. We will walk through these steps in the Xilinx Vitis Development Kit.

Open Vitis IDE and select the top-level workspace directory as the workspace
Select File, New, Platform Project
Name the platform "u96v1_mipi"
Select Create from hardware specs and select XSA in [hw_platform]
Select Linux OS and psu_cortexa53
Double-click platform.spr in the file navigator to open the project.
Customize the "linux on psu_cortexa53" domain to point to the boot components and bifs in [sw_platform/boot]
Customize the "linux on psu_cortexa53" domain to point to the image directory in [sw_platform/image]
Click the "hammer" icon or the "Generate Platform" button to generate output products from the platform project

Now that the platform has been built, notice that there is an "Exports" directory. This Exports directory is the complete built platform, ready to be zipped and shared - providing components to enable new developers to work on their custom platform.

16 Create a face detection application project

For the final application, we can target the MIPI platform for machine learning applications. We use a pre-built Xilinx Deep Learning Processor Unit (DPU) as an acceleration core and compile it into the platform using the Xilinx Vitis IDE, then build a user space application that calls the hardware to run a custom face detection application.

17Create a new application project

Start by creating a new application project. In the Vitis tools, application projects are saved in a system project container to provide a means for cohesive system development across the various enabled domains in a platform (for example: A53 and R5). Since we are working in the same workspace as before, it is easy to target the platform we generated previously - but you can also add other platform databases by clicking the "+" button and pointing to the directory containing the xpfm in the "Platform Selection" dialog.

Open Vitis IDE and select the top-level workspace directory as the workspace
Select File, New, Application Project
Name the project "face_detection" and use the automatically generated system project name
Select the "u96v1_mipi" platform you just created
Confirm that the Linux domain is selected on the next screen, and then point to the sysroot generated in sw_platform
Select Empty App as the template and click Finish

18 Edit Build Settings

Right-click [face_detection/src] in the file navigator and select "Import"
Select "General" and "File System"
Use [reference_files/application] as the target location and import all these files
Right-click face_detection in the file navigator and select C/C++ Build Settings
If you are not in the C/C++ Build Settings menu, navigate to it
For Configuration, select All Configurations
In the "GCC Host Linker, Libraries" submenu below, click the green "+" to add the following libraries:

- n2cube
- dputils
- opencv_core
- opencv_imgcodecs
- opencv_highgui
- opencv_imgproc
- opencv_videoio\
In the Host Linker, Other Items, Other Objects submenu, add dpu_densebox.elf from workspace./src location
In the Includes section for the host compiler, select the include path for XILINX_VIVADO_HLS and click the red "X" to delete it
Click Apply and Close

19Add DPU as hardware accelerator

Finally, we added the DPU as a hardware acceleration kernel and used Vitis to connect and compile the design.

Double-click project.sdx under the face_detection project to open the project view.
Under Hardware Capabilities, click the "lightning bolt" icon to add a new accelerator
Select the "dpu_xrt_top
Click on binary_container_1 and change the name to dpu
Right click on "dpu" and select the "Edit V++" option
Add --config ../src/connections.ini to specify which port of the DPU is connected to the platform interface created above
In the top right corner, change the active build to "System"
Click the "hammer" icon to build the project

This may take about 30 minutes or more depending on the computer you are using to complete the build. You may have noticed that we never took the hardware portion of the design for bitstream generation. When run, the tool uses the "open" interfaces in the hardware design, imports the DPU into the design, and then connects those interfaces to match what is called out in "connections.ini". After the design and new components are complete, synthesis and implementation are run to generate the binary and loaded into the architecture.

20 Running the Application on Ultra96

After the build process is complete, you will find a supplementary sd_card folder under the system directory of the project. Copy the sd_card image to a formatted SD card to boot the board. Once the board has successfully booted, you can follow a few quick steps to run the design.

On the board, change directory to [/run/media/mmcblktab/]
Copy the dpu.xclbin file to /usr/lib
Run face_detection.elf

When run without arguments, the face_detection application provides a helper dialog containing example pipelines (mipi, webcam, UDP streams) running through the application. These are provided to the application in a gstreamer-like sink to provide easy customization of the face detection application.

Example pipeline:

"./face_detection -i /dev/video0 -o autovideosink" will display over x11 forwarding or on local monitor
"./face_detection -i /dev/video0 -o udpsink host=192.168.1.50 port=8080" will stream over UDP\

THE END

Scan the QR code | Follow us

WeChat ID: xilinx_inc

Sina Weibo: @Xilinx Electronic Technology

Latest articles about

■Unleash the power of DSP with AMD Versal AI Engine

■AMD launches second-generation Versal Premium series

■Beijing Technology Day highlights: Unlocking the unlimited potential of technological innovation

■Optimizing C++ Algorithms: Achieving Superior Performance in Vitis™ HLS

■embeddedsw component migration guide (with download)

■Concurrent EDA builds machine vision system with AMD Embedded+

■AMD Expands Alveo Portfolio with World’s Fastest Electronic Transaction Accelerator Card in a Thin Form Factor

■MicroBlaze V Processor Reference Guide

■AMD Technology Day- Beijing Station invites registration

■AMD Launches Energy-Efficient EPYC Embedded 8004 Series for Embedded Systems