Exploring the overall architecture of driverless cars

Collect

As a future research direction for automobiles, driverless cars have a profound impact on the automobile industry and even the transportation industry. The advent of driverless cars will free human hands, reduce the frequency of traffic accidents, and ensure people's safety. At the same time, with the breakthrough and continuous advancement of core technologies such as artificial intelligence and sensor detection, driverless cars will become more intelligent and can also realize the industrialization of driverless cars. Although it is a consensus in the industry to understand driverless cars as robots and use the thinking developed by robots to process driverless car systems, there are also some cases that simply use artificial intelligence or intelligent agents to complete driverless driving. Among them, end-to-end driverless cars based on deep learning and driving intelligent agents based on reinforcement learning are current research hotspots.

See the source image

Let’s explore the mysteries inside the driverless car.

The functional modules of unmanned vehicles include perception, positioning, path planning, behavior decision-making, trajectory planning and execution, and also require the support of V2X and high-precision maps. The interaction between these parts and their interaction with vehicle hardware and other vehicles can be represented by the following figure:

Perception is the core module and the most studied module. It is simply divided into three parts:

The first is self-perception, that is, the vehicle's self-state, mainly the road network-level posture, lane-level posture and fine state.

The second is static target perception, including road signs and static obstacles.

The third is dynamic target perception, including dynamic target detection, tracking and prediction of its motion trajectory.

Path planning is divided into several levels. The top level is the global navigation level of the road network, which is refreshed at a frequency of more than 10 seconds. The next level is local path planning, which is refreshed at a frequency of 0.1-10 seconds. Local path planning includes lane-level path planning and obstacle avoidance planning. The bottom level is trajectory planning, which is refreshed at a frequency of 50-10Hz.

Perception is a key step in autonomous driving and is also the most researched step at present. It is divided into two parts:

Static features of the road: such as zebra crossings, lane lines, road signs, traffic signs beside the road, curbs, that is, the mathematical model of the entire road. Dynamic ones are traffic elements on the road, vehicles, pedestrians, traffic lights and other obstacles.

To accurately know the 3D posture and size of vehicles, pedestrians, and traffic lights, that is, to draw a 3D Bounding Box. This is the basis for obstacle avoidance planning. This part must be completed using a laser radar or a binocular camera, as a monocular camera cannot obtain 3D information. There are currently two main methods:

1. Completely based on LiDAR, multiple multi-line LiDARs are needed to convert point clouds into volume elements Voxel and then process them. The typical technology is Apple.

2. Integrate the data from LiDAR and cameras, use vision to identify obstacle boundaries, and use LiDAR point cloud to obtain 3D information.

Most manufacturers choose the latter technology route, including Waymo and Baidu. The former has a slightly lower technical maturity and is not used by many people, but in theory it is more reliable and consumes less computing resources, but it has high requirements for LiDAR. Apple uses 12 16-line LiDARs. The latter has lower requirements for LiDAR, 32 lines are enough, and the technology is more mature, but the reliability is a little worse and consumes a lot of computing resources.

LiDAR is a type of device that uses lasers for detection and ranging. It can send millions of light pulses to the environment every second. Its internal structure is a rotating structure, which enables LiDAR to build a 3D map of the surrounding environment in real time. Generally speaking, LiDAR scans the surrounding environment at a speed of about 10Hz. The result of a scan is a 3D map composed of dense points. Each point has (x, y, z) information. This map is called a point cloud graph. The following figure shows a point cloud map built using Velodyne VLP-32c LiDAR:

LiDAR is still the most important sensor in driverless systems due to its reliability. However, in actual use, LiDAR is not perfect. There are often problems such as too sparse point clouds or even missing some points. For irregular surfaces, it is difficult to identify the pattern using LiDAR. LiDAR cannot be used in conditions such as heavy rain.

The original point cloud data collected from the LiDAR and the original image data collected by the camera are processed by PointNet and ResNet respectively. After PointNet processing, global features and point-by-point features are obtained. Then, they are fused with the image data to obtain dense fusion, and finally the point-by-point branches (white arrows) at each corner are obtained. After global fusion, 3 Bounding boxes are finally obtained.

Small science:

ResNet is a residual network, which can be understood as a sub-network that can be stacked to form a very deep network.

PointNet is a deep learning framework for point cloud classification and segmentation proposed by Stanford University in 2016. As we all know, point clouds have irregular spatial relationships when classified or segmented, so the existing image classification and segmentation framework cannot be directly applied to point clouds. Therefore, many deep learning frameworks based on voxelization (gridding) of point clouds have been produced in the field of point clouds, and have achieved good results. However, voxelization of point clouds will inevitably change the original features of point cloud data, causing unnecessary data loss and additional workload. PointNet uses the original point cloud input method to maximize the retention of the spatial features of the point cloud and achieve good results in the final test.

Apple's pure lidar perception algorithm

LiDAR and machine vision are technologies that are currently being researched by companies working on autonomous driving technology. Technically speaking, the working principle of LiDAR is to calculate the time it takes for the emitted laser to reach the surface of an object and depict the outer contour of a three-dimensional object. Machine vision recognizes objects based on images.

As mentioned earlier, although there are not many companies that are completely based on LiDAR technology, Apple, a technology giant, is the first to try it out. In 2017, they published a paper titled "VoxelNet: end-to-end learning for point cloud based 3D object detection", which describes a pure LiDAR unmanned vehicle perception algorithm.

In the paper, Apple researchers said that the current technical difficulty lies in the fact that when lidar depicts the outer contours of three-dimensional objects, it is difficult to identify smaller objects at a long distance, and in terms of these raw data, engineers still need to manually organize these laser points to facilitate subsequent identification.

The 64-line LDAR installed on the unmanned vehicle rotates about ten times per second. Each time it rotates, it collects information about about 100,000 reflection points and stores them in a Raw Point Cloud. In the KImI Dataset, about 7,000 pieces of driving data collected in Germany are annotated with 3D Bounding Boxes. The annotated categories include cars, pedestrians, and riders. Each single point in each Raw Point Cloud only has coordinates (x, y, z) and reflection intensity (Intensity) information. To perform 3D Bounding Box Prediction, only the information of Point Cloud is used.

The network structure of VoxelNet mainly consists of three functional modules: (1) feature learning layer; (2) convolutional intermediate layer; (3) region proposal network (RPN).

In addition, the VoxelNet they proposed reduces more manual processes during the processing. VoxelNet recognizes objects by grouping the points depicted by the lidar, analyzing the shape information described by each point, combining them, and using an object recognition algorithm called RFN to generate the final recognition result.

Apple said that this technology will help develop hardware products such as autonomous navigation and sweeping robots. However, when testing VoxelNet, Apple researchers tested it on the KITTI computer vision algorithm evaluation platform and did not apply it to actual test vehicles.

Apple is undoubtedly interested in autonomous driving, and Cook has mentioned this in many public speeches. Currently, the only confirmed news is that the California Department of Motor Vehicles has documents showing that Apple has obtained a license for autonomous driving road testing and has recruited former NASA and Tesla employees as road test drivers.

The car project was codenamed Titan, but there were only rumors about what the car looked like and whether Apple was developing a complete self-driving car or a car system. However, at that time, Apple had already bought a lot of land for this purpose in Silicon Valley.

MacCallister Higgins, founder of self-driving company Voyage, claimed that he saw Apple's self-driving car, which appeared to be equipped with 6 lidar sensors on the top of the car.

Apple's research results on driverless technology are very clear in this paper. Apple has been committed to the research of driverless technology for a long time. In order to obtain the driverless test license issued by the California Department of Motor Vehicles, its test vehicles are already ready.

The paper details how Apple researchers, including authors Yin Zhou and Oncel Tuzel, developed VoxelNet, which can infer what an object is from a beam of information points captured by a laser radar. In fact, the working principle of laser radar is to emit lasers around, generate high-resolution point maps, and then record the feedback results.