Ultra96 (v1) with Vitis: DPU Integration and MIPI Platform Tutorial
Ultra 96™ Introduction
The Ultra 96 is a great platform for building machine learning applications for edge use cases. The 96-board form factor and programmable logic of the Zynq® MPSoC ZU3 device provide the flexibility to add a common MIPI CSI2 RX standard interface for video input for such end applications, while also integrating the Xilinx Deep Learning Processing Unit (DPU) into the design to drive high-performance, low-power machine learning edge applications. Since many first-time adopters of the Xilinx Vitis™ unified software platform flow will start with an existing Vivavdo-based design, we will begin our tutorial by looking at how to convert a legacy design implemented in Vivado IPI to an accelerated Vitis target platform. The MIPI pipeline in this design leverages the common 96Boards form factor of the Ultra96, uses an OV5640 imaging sensor on a MIPI imaging mezzanine card, and uses the YUYV output type to directly input the frame buffer write DMA core from the MIPI RX IP and then to the PS DDR. Next, we will demonstrate the steps to update the PetaLinux project to include the necessary libraries and drivers to create a Vitis software platform capable of supporting hardware-accelerated workflows, including DPU-based machine learning applications. Once the hardware and software platform components are complete, we will use the Vitis Development Kit to combine them into a Vitis acceleration platform, which we can then use to build hardware-accelerated software applications. Finally, we will introduce the integration of the Xilinx Deep Learning Processing Unit (DPU) in machine learning acceleration applications. After adding the DPU, we can use the provided DPU runtime to evaluate a high-performance face detection application that uses streaming MIPI input from the generated platform.
01Design requirements
This section lists the hardware and software tools required to accelerate machine learning algorithms using the Xilinx Deep Learning Processing Unit (DPU) IP.
-
Xilinx Design Tools 2019.2
-
Ultra96 v1 Plate Files
-
Ultra96 Board (v1)
-
Ultra96 12V Power Supply
-
MicroUSB to USB-A cable
-
AES-ACC-USB-JTAG Board
-
A blank microSD card formatted with the FAT32 file system
-
Xilinx CSI2 RX MIPI IP License
-
D3 Engineering Designcore Camera Mezzanine Board OV5640 (MIPI Input Source)
Optional
-
DisplayPort Monitors
-
Mini DisplayPort cable for selected monitors
-
USB webcam
02Prepare your workspace
Copy the database to your local machine and download the reference files directory. After downloading the reference files, unzip them into the reference files directory of the database you just copied. The remaining folders in this hierarchy will be empty after the download is complete and will be populated throughout the tutorial.
03Generate MIPI basic project
First, we will create the original non-accelerated MIPI project in Vivado® and Petalinux tools. After completing this step, you will have a bootable hardware and software image that can be used to start the pipeline to view the input MIPI video from the Ultra96.
living
-
Copy the "scripts" folder from the [reference_files/base_project/vivado] directory to the top-level vivado directory
-
Launch Vivado and create a new project called "u96v1_mipi" in the top-level "vivado" directory, making sure the "Create project subdirectory" option is selected and click "Next"
-
Select Create an RTL Project and click Next
-
Select "Add File", select the "U96v1_mipi_wrapper.v" file from the [reference_files/base_project/vivado/sources] directory, and click "Next"
-
Select "Add File", select the "cnst.xdc" file from the [reference_files/base_project/vivado/sources] directory and click "Next"
-
Select the "Board" tab and choose the Ultra96 V1 board for the project, then select "Next" and "Finish" to create the project
-
In the "TCL Console" tab at the bottom of the screen, change directories to the top-level "vivado" directory
-
Use the TCL console to call the resource ./scripts/u96v1_mipi_static.tcl
-
Select Generate Block Design in the Flow Navigator window and allow this to complete
-
Build the project as a bitstream
-
Select "File > Export > Export Hardware" and export the hardware to the hw_platform top-level directory, including the bitstream
PetaLinux
-
Get Petalinux Tools
-
Change directory to the top-level directory and create a new petalinux project using the following command: petalinux-create –t project –n petalinux –sreference_files/base_project/bsps/u96v1_mipi_static.bsp
-
Change directory to the new directory
-
Run petalinux-config --get-hw-description ../hw_platform --silentconfig to import the generated hardware
-
Read the petalinux-config -c rootfs and petalinux-config -c kernel menus to see what customizations have been made to include MIPI on Ultra96
-
运行 petalinux-build to build the system
-
Run petalinux-package --boot --force --fsbl --pmufw --u-boot --fpga to create BOOT.bin
-
Copy BOOT.bin and image.ub from the [petalinux/images/linux] directory to the SD card and use it to boot the system
-
[Enter GStreamer commands here to test video input]
04Create Vitis hardware platform
We will now make the necessary additions and modifications to the hardware design to prepare it for software-defined acceleration: Open the Vivado base project to get started.
05Configure MPSoC Block
As we add additional components to the hardware design to meet acceleration requirements, the processing subsystem needs to be customized. Here, we will modify the configuration to create additional clocks, open additional interrupt ports, and create AXI master ports to add additional peripherals to the design.
-
Double-click the Zynq IP block to open the Processor Configuration Wizard (PCW)
-
Go to the Clock Configuration tab, select the Output Clocks tab, expand Low Power Domain Clocks > PL Fabric Clocks
-
Enable PL1 and change the requested frequency of both clocks to 100MHz, select IOPLL as the source clock
-
Go to the PS-PL Configuration tab, expand General > Schema Reset, and enable the second schema reset
-
Add another AXI Master Port, we will use it later to connect the interrupt controller - Click PS-PL Interfaces > Master Interfaces and enable AXI HPM0 LPD
-
Expand General > Interrupts > PS-PL
-
Enable IRQ1[0-7]
-
Click OK to exit PCW
06Configure platform interface
In order for the Vitis tool to insert the hardware acceleration block into the design, we need to leave it open and give it an interface that can be used to connect to the block. In this design, we need some memory mapped interfaces so that the DPU can connect to the PS DDR. On this platform we will open three HP slave ports, since there are three memory mapped master ports on the DPU block. In addition, this part of the flow allows us to "name" the port, giving it a shorter nickname to specify the connection later.
-
From the Window menu, select Platform Interface
-
In the Platform Interfaces tab, click Enable Platform Interfaces
-
Right-click the interface and select Enable (HP 0, 1, 2), add three unused PS HPx_FPD slaves
-
Also, enable the HPM0 master interface - make sure this is disabled on the Zynq PS block. The tools will use this master to connect to the accelerator
-
For each enabled slave interface, add a "sptag" value in the "Options" tab that will reference that port later in the flow: HP0, HP1, and HP2 respectively
07Specify the "Platform" Clock
Similar to how we specified the interface for the platform, we now have to indicate to the tool which clocks it should use for the accelerators in the platform. In this case, the DPU uses two clocks (1x and 2x clocks). Therefore, we will indicate to the platform 250MHz and 500MHz clocks. The DPU clock frequency can be faster or slower than this rate, and the rate is chosen to balance power consumption and frame rate performance in the application.
-
Right click on the block design, select "Add IP", then add the Clock Wizard IP
-
Change the instance name to clk_wiz_dynamic
-
Double-click the clk_wiz_dynamic IP and make the following changes in the Output Clocks tab: [clk_out1=250MHz], [clk_out2=500MHz], [Both Routing Match], [Reset Type = Active Low]
-
Move the original clock wizard (clk_wiz_static) from pl clk0 to pl clk1
-
In the Platform Interfaces tab, enable clk_out1 and clk_out2 of the clk_wiz_dynamic instance
-
Set the slower clock (clk_out1 in this case) to the default value
-
The id of clk_out1 should be set to 0, and the id of clk_out2 should be set to 1
-
Make sure that the proc_sys_reset block listed in each window is set to an instance connected to the clock
08 Separate the original components
In this design, we chose to place the original component (MIPI subsystem) on a separate clock from the PS. The accelerator's clock wizard and processor system reset are connected to the PL0 clock, and the MIPI subsystem is connected to the PL1 clock. This helps us ensure that any changes in the clock frequency (or clock gating) of the original or accelerated components will not affect the operation of the other components.
-
Right click on pl_clk0 and select "Disconnect Pin" in the menu
-
Connect pl_clk0 to clk_wiz_dynamic clk_in1 and pl_clk1 to clk_wiz_static clk_in1
-
Delete the network connected to pl_reset0
-
Right click on the block design, select "Add IP", and add the Processor System Reset IP for each new clock.
-
Name them proc_sys_reset_dynamic_1 and proc_sys_reset_dynamic_2
-
Connect the clk_out1 and clk_out2 outputs of the clk_wizard_dynamic block to the slowest_sync_clk inputs of proc_sys_reset_dynamic_1 and proc_sys_reset_dynamic_2 respectively
-
Connect PS pl_reset0 to the ext_reset_in input of the system reset blocks of both new processors
-
Connect pl_reset0 to the reset port of clk_wiz_dynamic
-
Connect pl_reset1 to the reset port of clk_wiz_static and the ext_reset_in pin of proc_sys_reset_200
-
Connect the clk_wiz_dynamic locked output to the dcm_locked input of the two new processor system reset blocks
09 Enable interrupt-based kernel
The default scheduling mode for the acceleration core is polling mode. To enable interrupt-based processing in the platform, we need to add an interrupt controller. In the current design, we need to connect the constant "gnd" to the interrupt controller without connecting any valid interrupt source at this time. Paired with the AXI interrupt controller is the "dynamic_postlink" tcl script in the Vivado source code, which will select the interrupt constant network, disconnect it from the connection block, and then automatically connect it after adding the acceleration core through the Vitis tool.
-
Right click on the block design, select "Add IP", add AXI Interrupt Controller
-
In the block properties of the interrupt controller, set the name to axi_intc_dynamic
-
Add a "connect" IP to connect the input to the interrupt controller
-
In the block properties of the "Connect" block, set the name to int_concat_dynamic
-
Double-click the "Connect" block and change the number of ports to 8
-
Added "Constant" IP to provide a constant "0" to the interrupt controller - this constant will be disconnected by the tool at compile time and replaced by the accelerated interrupt connection
-
Double-click the "Constant" IP and change the constant value to 0
-
Click the Run Connection Automation link in the Design Assistant bar and connect the slave AXI interface of the AXI interrupt controller - select HPM0_LPD since HPM1_FPD is used for the video subsystem.
-
Connect all inputs of the interrupt controller to the "Connect" block output
-
Connect the output of the "connect" block to the intr input of the interrupt controller
-
Connect the output of the interrupt controller to pl_ps_irq1 on the PS block
-
Select the output network of the Constant block and name it int_const_net
10Generative Design and XSA
Now that we have customized the design, we can export it to the Vitis tool via the Xilinx Support Archive (XSA). Please note: we will not be building this project into a bitstream. The Vitis tool will use this archive to import the design, synthesize it in the hardware accelerator, and then it will build the bitstream. Automating this part of the process using the dsa.tcl script - The naming and platform details are automated before the Xilinx Support Archive (XSA) file is exported to the hw_platform directory. In addition, this script links the previously mentioned dynamic_postlink.tcl script so that the scripts specific to this platform are included in the archive file.
-
Generate Block Design
-
Export the hardware platform by running source ./scripts/dsa.tcl
11. Create a software platform
The software platform requires some changes to the Petalinux project to add the necessary Xilinx Runtime (XRT) components to the design. At this point, there are two options: follow all of the steps below to copy the necessary files and enable these components in Petalinux, or skip 1-8 and replace the Petalinux project with the new project in u96v1_mipi_dynamic.bsp at [reference_files/platform_project/bsps/u96v1_mipi_dynamic.bsp].
12Add Xilinx runtime program
The first step in creating an acceleration platform is to add the library components mentioned above: Xilinx runtime and DPU runtime (dnndk). They come in the form of programs that we will add to the user layer in the Petalinux build. First, copy the files and build the programs, then enable them through the Petalinux root file system configuration menu.
-
Change directory to the Petalinux directory
-
Add the program to add the DPU utilities, libraries, and header files to the root file system cp -rp ../reference_files/platform_project/plnx/recipes-apps/dnndk project-spec/meta-user/recipes-apps
-
Add a procedure to add the Xilinx Runtime (XRT) driver cp -rp ../reference_files/platform_project/plnx/recipes-xrt project-spec/meta-user
-
Add a program creation plugin for adding an "autostart" script that runs automatically during Linux bootup cp -rp ../reference_files/platform_project/plnx/recipes-apps/autostart project-spec/meta-user/recipes-apps
-
Add the above program to the Petalinux image configuration
we
project-spec/meta-user/recipes-core/images/petalinux-image-full.bbap
pend
And add this to the end of the document:
IMAGE_INSTALL_append = " dnndk"
IMAGE_INSTALL_append = " autostart"
IMAGE_INSTALL_append = " opencl-headers"
IMAGE_INSTALL_append = " ocl-icd"
IMAGE_INSTALL_append = " xrt"
IMAGE_INSTALL_append = " xrt-dev"
IMAGE_INSTALL_append = " zocl" -
Update the Petalinux project with the new XSA exported from Vivado
petalinux-config --get-hw-description=../hw_platform --silentconfig
-
Open the Petalinux root file system configuration GUI to enable the above program
petalinux-config -c rootfs
Then enable all the programs above in the "User Packages" submenu
13 Modify the Linux device tree
The Linux device tree needs to be modified in order to properly detect the Xilinx runtime kernel driver. Modify [project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi] to add the Zynq OpenCL node to the device tree.
At the bottom of project-spec/meta-user/recipes-bsp/device-tree/files/system-user.dtsi, add the following text:
&amba {
zyxclmm_drm: zyxclmm_drm@0xA0000000 {
reg = <0x0 0xA0000000 0x0 0x800000>;
compatible = "xlnx,zocl";
status = "okay";
interrupt-parent = <&axi_intc_dynamic>;
interrupts = <0 1>, <1 1>, <2 1>, <3 1>,
<4 1>, <5 1>, <6 1>, <7 1>;
};
};\
14Build Petalinux and package software components
At this point, we have made all the necessary configuration changes for the Petalinux build and can begin the build. This may take quite a while, but it may also be quite short given the processing power of your computer. Once the Linux build is complete, we need to move all of the built software components into a common directory. By having all of the boot components in one directory, it becomes easy when packaging the resulting platform on both the hardware and software side. Additionally, we will also use Petalinux to build a sysroot in order to provide a complete cross-compilation environment for this software platform. This sysroot will also be included in the software portion of the platform, as it will be needed to provide the correct versions of the header/include files when compiling for the platform.
-
Building Petalinux
petalinux-build\
-
Copy all .elf files from the [petalinux/images/linux] directory to [SW_Platform/boot], this should copy the following files:
o ARM Trusted Firmware - b131.elf
o PMU firmware – pmufw.elf
o U-Boot – u-boot.elf
o Zynq FSBL – zynqmp_fsbl.elf
-
Copy the image.ub file from the [petalinux/images/linux] directory to [SW_Platform/image]
-
Copy the linux.bif file from the [reference_files/platform_project/plnx] directory to [sw_platform/boot]
-
Build Yocto SDK from project petalinux-build --sdk (provides sysroot)
-
Move [images/linux/sdk.sh] to [SW_Platform/sysroot], then extract the SDK cd sw_platform/sysroot ./sdk.sh -d ./-y
15 Generate Vitis software platform
The Vitis software platform is a set of components that includes everything needed to boot and develop a specific board/board configuration and consists of both hardware and software components. Now that we have built the hardware (XSA) and software (Linux image and boot elf file) components for the platform, we can use these components to generate and export a user-defined custom platform. We will walk through these steps in the Xilinx Vitis Development Kit.
-
Open Vitis IDE and select the top-level workspace directory as the workspace
-
Select File, New, Platform Project
-
Name the platform "u96v1_mipi"
-
Select Create from hardware specs and select XSA in [hw_platform]
-
Select Linux OS and psu_cortexa53
-
Double-click platform.spr in the file navigator to open the project.
-
Customize the "linux on psu_cortexa53" domain to point to the boot components and bifs in [sw_platform/boot]
-
Customize the "linux on psu_cortexa53" domain to point to the image directory in [sw_platform/image]
-
Click the "hammer" icon or the "Generate Platform" button to generate output products from the platform project
Now that the platform has been built, notice that there is an "Exports" directory. This Exports directory is the complete built platform, ready to be zipped and shared - providing components to enable new developers to work on their custom platform.
16 Create a face detection application project
For the final application, we can target the MIPI platform for machine learning applications. We use a pre-built Xilinx Deep Learning Processor Unit (DPU) as an acceleration core and compile it into the platform using the Xilinx Vitis IDE, then build a user space application that calls the hardware to run a custom face detection application.
17Create a new application project
Start by creating a new application project. In the Vitis tools, application projects are saved in a system project container to provide a means for cohesive system development across the various enabled domains in a platform (for example: A53 and R5). Since we are working in the same workspace as before, it is easy to target the platform we generated previously - but you can also add other platform databases by clicking the "+" button and pointing to the directory containing the xpfm in the "Platform Selection" dialog.
-
Open Vitis IDE and select the top-level workspace directory as the workspace
-
Select File, New, Application Project
-
Name the project "face_detection" and use the automatically generated system project name
-
Select the "u96v1_mipi" platform you just created
-
Confirm that the Linux domain is selected on the next screen, and then point to the sysroot generated in sw_platform
-
Select Empty App as the template and click Finish
18 Edit Build Settings
-
Right-click [face_detection/src] in the file navigator and select "Import"
-
Select "General" and "File System"
-
Use [reference_files/application] as the target location and import all these files
-
Right-click face_detection in the file navigator and select C/C++ Build Settings
-
If you are not in the C/C++ Build Settings menu, navigate to it
-
For Configuration, select All Configurations
-
In the "GCC Host Linker, Libraries" submenu below, click the green "+" to add the following libraries:
- n2cube
- dputils
- opencv_core
- opencv_imgcodecs
- opencv_highgui
- opencv_imgproc
- opencv_videoio\ -
In the Host Linker, Other Items, Other Objects submenu, add dpu_densebox.elf from workspace./src location
-
In the Includes section for the host compiler, select the include path for XILINX_VIVADO_HLS and click the red "X" to delete it
-
Click Apply and Close
19Add DPU as hardware accelerator
Finally, we added the DPU as a hardware acceleration kernel and used Vitis to connect and compile the design.
-
Double-click project.sdx under the face_detection project to open the project view.
-
Under Hardware Capabilities, click the "lightning bolt" icon to add a new accelerator
-
Select the "dpu_xrt_top
-
Click on binary_container_1 and change the name to dpu
-
Right click on "dpu" and select the "Edit V++" option
-
Add --config ../src/connections.ini to specify which port of the DPU is connected to the platform interface created above
-
In the top right corner, change the active build to "System"
-
Click the "hammer" icon to build the project
This may take about 30 minutes or more depending on the computer you are using to complete the build. You may have noticed that we never took the hardware portion of the design for bitstream generation. When run, the tool uses the "open" interfaces in the hardware design, imports the DPU into the design, and then connects those interfaces to match what is called out in "connections.ini". After the design and new components are complete, synthesis and implementation are run to generate the binary and loaded into the architecture.
20 Running the Application on Ultra96
After the build process is complete, you will find a supplementary sd_card folder under the system directory of the project. Copy the sd_card image to a formatted SD card to boot the board. Once the board has successfully booted, you can follow a few quick steps to run the design.
-
On the board, change directory to [/run/media/mmcblktab/]
-
Copy the dpu.xclbin file to /usr/lib
-
Run face_detection.elf
When run without arguments, the face_detection application provides a helper dialog containing example pipelines (mipi, webcam, UDP streams) running through the application. These are provided to the application in a gstreamer-like sink to provide easy customization of the face detection application.
Example pipeline:
"./face_detection -i /dev/video0 -o autovideosink" will display over x11 forwarding or on local monitor
"./face_detection -i /dev/video0 -o udpsink host=192.168.1.50 port=8080" will stream over UDP\
THE END
Scan the QR code | Follow us
WeChat ID: xilinx_inc
Sina Weibo: @Xilinx Electronic Technology