How to perform 3D object detection on LiDAR point clouds
Source: InternetPublisher:super_star Keywords: 3D LIDAR Updated: 2024/05/27
This project will leverage the PYNQ-DPU overlay on the KV260, enabling us to do 3D object detection on LiDAR point clouds more efficiently than ever before!
background
Environmental perception plays an integral role in building self-driving cars, autonomous navigating robots, and other real-world applications.
Why 3D Object Detection on Point Clouds?
While deep learning-based 2D object detection from camera data shows high accuracy, it may not be effective for activities such as localization, measuring distances between objects, and calculating depth information.
The point cloud generated by the LiDAR sensor provides 3D information of the object to more effectively locate the object and characterize the shape. Therefore, 3D object detection on point clouds is emerging in various applications, especially in autonomous driving.
Nevertheless, designing LiDAR-based 3D object detection systems is challenging. First, such systems require a lot of computation in model inference. Second, since point cloud data is irregular, the processing pipeline requires pre-processing and post-processing to provide end-to-end perception results.
KV260 is a perfect match for 3D object detection systems. The expensive computation of model inference can be offloaded to and accelerated by the programmable logic portion of KV260, while the powerful ARM core of KV260 is capable of handling pre-processing and post-processing tasks.
Design Overview
We now discuss the selected deep learning models for 3D object detection on point clouds and a system overview including software and hardware.
Network Architecture
As a sanity check on existing work, we chose the ResNet-based Keypoint Feature Pyramid Network (KFPN), the first real-time system for monocular 3D detection with state-of-the-art performance on the KITTI benchmark. In particular, we adopted its open source PyTorch implementation on point clouds, called SFA3D.
PYNQ-DPU on KV260
The reason why we use Ubuntu Desktop 20.04.3 LTS for Xilinx development board instead of Petalinux as the operating system on KV260 is that Ubuntu is a good development environment for installing the packages required for pre-processing point clouds and post-processing results. On the other hand, KV260's support for Pynq and DPU coverage avoids designing an efficient DPU from scratch and enables us to work in a python environment. This greatly simplifies the migration of CPU/GPU-based deep learning implementations to KV260.
Setting up the environment
Follow the official guide to install the Ubuntu image to KV260, and then refer to Github to install Pynq in Ubuntu OS. Git clones all the required files and installs the required packages to the board by executing the following commands.
git clone https://github.com/SoldierChen/DPU-Accelerated-3D-Object-Detection-on-Point-Clouds.git
cd DPU-Accelerated-3D-Object-Detection-on-Point-Clouds
pip install -r requirements.txt
Here, we need Pytorch 1.4 because the VART of Pynq DPU is v1.4.
data preparation
The data that needs to be downloaded include:
Velodyne Point Cloud (29 GB)
Training labels for object dataset (5 MB)
Camera calibration matrices for the object dataset (16 MB)
Left color image of the object dataset (12 GB) (for visualization purposes only)
To visualize the 3D point cloud using a 3D box, let's execute:
cd model_quant_compile/data_process/
python kitti_dataset.py
Model Training
python train.py --gpu_idx 0
This command uses one GPU for training, but it supports distributed training. In addition, you can choose fpn_resnet or resnet as the target model. The trained model will be stored in a checkpoint folder named "Model_restnet/fpn_resnet_epoch_#". Depending on your hardware, the epoch can be from 10 to 300, and the higher the accuracy, the better.
Model quantization and compilation
Similarly, since Pynq's VART is V1.4, we need VITIS AI v1.4 instead of the latest version (V2.0) to perform model quantization.
# install the docker at first (if not stalled)
docker pull xilinx/vitis-ai-cpu:1.4.1.978
# run the docker
./docker_run.sh xilinx/vitis-ai-cpu:1.4.1.978
We then quantize the model using the following command:
# activate the pytorch environment
conda activate vitis-ai-pytorch
# install required packages
pip install -r requirements.txt
# configure the quant_mode to calib
ap.add_argument(’-q’, ’--quant_mode’, type=str, default=’calib’, choices=[’calib’,’test’], help=’Quantization mode (calib or test). Default is calib’)
# here, it quantize the example model: Model_resnet_18_epoch_10.pth
python quantize.py
# configure the quant_mode to test
ap.add_argument(’-q’, ’--quant_mode’, type=str, default=’test’, choices=[’calib’,’test’], help=’Quantization mode (calib or test). Default is calib’)
# here, it outputs the quantized model.
python quantize.py
Next, we will compile the model:
./compile.sh zcu102 build/
Never mind that zcu102 shares the same DPU architecture as KV260. You will see the following message for a successful compilation:
So far, we have a compiled xmodel that can be executed on the DPU, over-executing on the KV260. Next, we deploy it on the board and develop the application code.
KV260 deployment
According to the official guide, we first installed the Ubuntu operating system on the KV260. Then, we installed Python on the board according to the PYNQ-DPU GitHub.
After building the board, we need to install git, clone the code to the board, and copy the compiled xmodel into the folder.
Application code design
Here we will describe how to call and interface with the DPU for inference.
We first load the DPU overlay and the customized xmodel. Then, it is important to know the input and output tensor information to coordinate with the dataset. Here, we have only one tensor as input and five tensors as output. Allocate the input and output buffers accordingly.
# load model and overly
overlay = DpuOverlay("dpu.bit")
overlay.load_model("./CNN_zcu102.xmodel")
dpu = overlay.runner
# get tensor information
inputTensors = dpu.get_input_tensors()
outputTensors = dpu.get_output_tensors()
shapeIn = tuple(inputTensors[0].dims)
outputSize = int(outputTensors[0].get_data_size() / shapeIn[0])
shapeOut = tuple(outputTensors[0].dims)
shapeOut1 = tuple(outputTensors[1].dims)
shapeOut2 = tuple(outputTensors[2].dims)
shapeOut3 = tuple(outputTensors[3].dims)
shapeOut4 = tuple(outputTensors[4].dims)
# allocate input and output buffers.
# Note the output is a list of five tensors.
output_data = [np.empty(shapeOut, dtype=np.float32, order="C"),
np.empty(shapeOut1, dtype=np.float32, order="C"),
np.empty(shapeOut2, dtype=np.float32, order="C"),
np.empty(shapeOut3, dtype=np.float32, order="C"),
np.empty(shapeOut4, dtype=np.float32, order="C")]
# the input is only one tensor.
input_data = [np.empty(shapeIn, dtype=np.float32, order="C")]
image = input_data[0]
The process of one-shot inference is encapsulated in the function below. Here, we permute the input tensor to the shape of the DPU input tensor and permute the tensor to the shape required for post-processing. This is critical for correct results.
def do_detect(dpu, shapeIn, image, input_data, output_data, configs, bevmap, is_front):
if not is_front:
bevmap = torch.flip(bevmap, [1, 2])
input_bev_maps = bevmap.unsqueeze(0).to("cpu", non_blocking=True).float()
# do permutation
input_bev_maps = input_bev_maps.permute(0, 2, 3, 1)
image[0,...] = input_bev_maps[0,...] #.reshape(shapeIn[1:])
job_id = dpu.execute_async(input_data, output_data)
dpu.wait(job_id)
# convert the output arrays to tensors for the following post-processing.
outputs0 = torch.tensor(output_data[0])
outputs1 = torch.tensor(output_data[1])
outputs2 = torch.tensor(output_data[2])
outputs3 = torch.tensor(output_data[3])
outputs4 = torch.tensor(output_data[4])
# do permutation
outputs0 = outputs0.permute(0, 3, 1, 2)
outputs1 = outputs1.permute(0, 3, 1, 2)
outputs2 = outputs2.permute(0, 3, 1, 2)
outputs3 = outputs3.permute(0, 3, 1, 2)
outputs4 = outputs4.permute(0, 3, 1, 2)
outputs0 = _sigmoid(outputs0)
outputs1 = _sigmoid(outputs1)
# post-processing
detections = decode(
outputs0,
outputs1,
outputs2,
outputs3,
outputs4, K=configs.K)
detections = detections.cpu().numpy().astype(np.float32)
detections = post_processing(detections, configs.num_classes, configs.down_ratio, configs.peak_thresh)
return detections[0], bevmap
Execute on KV260
Inference on the demo data will be performed on the DPU by running the following command:
python demo_2_sides-dpu.py
Then run the following command:
pythondemo_front-dpu.py
Performance ranges from 10 to 20 FPS, which is 100 to 200 times faster than execution on a server-grade CPU (Intel Xeon Gold 6226R).
in conclusion
In summary, we have shown how easy it is to use AMD-Xilinx DPU on KV260 to accelerate point cloud based 3D object detection. To further improve performance, we plan to optimize the model inference stage by using multiple DPU instances, as well as the pre-processing and post-processing stages by using multi-threading and batching.
- Ultrasonic dog chaser made by NE555
- How to create image processing solutions using HLS capabilities
- How to create a low-cost imaging system using FPGA and CMOS
- Make a simple electronic candle using Arduino Uno and LEDs
- Fiber Optic Data Receiver Circuit
- Woodworking machine tool circuit b
- Main circuit of transformer composite thyristor pulse power supply
- M1332C/M1332CX15 cylindrical grinder electrical schematic circuit
- Humidity detection control circuit
- DC motor pulse width speed regulation circuit using integrated circuit
- How does an optocoupler work? Introduction to the working principle and function of optocoupler
- 8050 transistor pin diagram and functions
- What is the circuit diagram of a TV power supply and how to repair it?
- Analyze common refrigerator control circuit diagrams and easily understand the working principle of refrigerators
- Hemisphere induction cooker circuit diagram, what you want is here
- Circuit design of mobile phone anti-theft alarm system using C8051F330 - alarm circuit diagram | alarm circuit diagram
- Humidity controller circuit design using NAND gate CD4011-humidity sensitive circuit
- Electronic sound-imitating mouse repellent circuit design - consumer electronics circuit diagram
- Three-digit display capacitance test meter circuit module design - photoelectric display circuit
- Advertising lantern production circuit design - signal processing electronic circuit diagram