Panoramic/fisheye camera close-range perception for low-speed autonomous driving

Publisher:平和的心情Latest update time:2022-11-28 Source: elecfans Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Cameras are the primary sensors in autonomous driving systems, they provide high information density and are best suited for detecting road infrastructure set up for human vision. Panoramic camera systems typically include four fisheye cameras with a 190°+ field of view covering the entire 360° around the vehicle, focusing on near-field perception. They are the primary sensors for low-speed, high-precision, and close-range sensing applications such as automated parking, traffic jam assistance, and low-speed emergency braking.

In this work, the paper presents a detailed survey of such vision systems and investigates them in the context of an architecture that can be decomposed into four modular components (i.e., recognition, reconstruction, relocalization, and reorganization), collectively referred to as the 4R architecture. The paper discusses how each component accomplishes a specific aspect and provides a positional argument (i.e., they can work together) to form a complete low-speed automated perception system.


The work in this paper is partly inspired by the work of Malik et al. in [5]. The authors of this work proposed that the core problems of computer vision are reconstruction, recognition, and recombination, which they call the 3Rs of computer vision. Here, the paper proposes to expand and specialize the 3Rs of computer vision into the 4Rs of computer vision for autonomous driving: reconstruction, recognition, recombination, and relocalization.


Reconstruction means inferring the scene geometry from a video sequence, including the position of the vehicle in the scene. The importance of this should be obvious, as it is critical for problems such as scene rendering, obstacle avoidance, maneuvering, and vehicle control. Malik et al. extend this beyond geometry inference to include properties such as reflections and lighting. However, these additional properties are not (at least for now) important in the context of autonomous driving computer vision, so the paper defines reconstruction as 3D geometry recovery in the more traditional sense.


Recognition is a term used to attach semantic labels to various aspects of a video image or scene, and recognition includes hierarchical structures. For example, cyclists have a spatial hierarchy as it can be divided into subsets of bicycles and riders, while the vehicle category can have classification subcategories such as cars, trucks, bicycles, etc. This can go on for as long as it is useful to the autonomous system. Lights can be classified by type (headlights, street lights, brake lights, etc.), color (red, yellow, green), and their importance to the autonomous vehicle (need to respond, can be ignored), completing the high-level reasoning of the system.


Relocalization refers to the position identification and metric localization of a vehicle relative to its surroundings. It can be done against pre-recorded trajectories in the host vehicle, e.g., a trained parking lot, or against maps transmitted from infrastructure, e.g., HD Maps. It is highly related to loop closure in SLAM, although it does not consider just loop closure problems, but the broader problem of localizing a vehicle according to one or more predefined maps.


Recombination is the process of combining the information from the first three components of computer vision into a unified representation. In this article, we use this term to equate to “late fusion”, which is an important step for autonomous driving, as vehicle control requires a unified representation of the sensor outputs, which also allows the fusion of multiple camera outputs in the late stage.

f5c0d296-6928-11ed-8abf-dac502259ad0.png

Introduction to Near Field Perception System

Automatic parking system

Automated parking systems are one of the main use cases for short-range sensing, and Figure 4 depicts some typical parking use cases. Early commercial semi-automated parking systems used ultrasonic sensors or radar, however, more recently, surround view cameras are becoming one of the main sensors for automated parking. A major limitation of ultrasonic and mmWave radar sensors for automated parking is that parking spaces can only be identified based on the presence of other obstacles (Figure 5). Additionally, surround view camera systems allow parking in the presence of visual parking markers, such as painted line markings, and are also seen as a key technology to enable valet parking systems.

f5da492e-6928-11ed-8abf-dac502259ad0.png

Traffic Jam Assist

As the majority of accidents are low-speed rear-end collisions, traffic jam situations are considered one of the driving areas where benefits could be achieved in the short term, although current systems may lack robustness. In an automated traffic jam assistance system, the vehicle controls its longitudinal and lateral position in a traffic jam (Figure 6). This feature is typically used in low-speed environments, with a maximum speed of ∼60 kph, but a lower maximum speed of 40 kph is recommended.

While traffic jam assistance typically considers highway scenarios, urban traffic jam assistance systems have been investigated. Given the low-speed nature of this application, surround-view cameras are ideal sensors, especially in urban environments, where, for example, pedestrians can try to cross from areas outside the field of view of a traditional forward-facing camera or radar system. Figure 7 shows an example of traffic jam assistance using a surround-view camera. In addition to detecting other road users and landmarks, features such as depth estimation and SLAM are also important for inferring the distance to objects and controlling the vehicle’s position.

f5eb8ac2-6928-11ed-8abf-dac502259ad0.png

Low speed braking

One study showed that automatic rear braking significantly reduced collision claims, with vehicles equipped with a rear camera, park assist, and automatic braking reporting 78% fewer collisions. Surround view camera systems are very useful for low-speed braking, as a combination of depth estimation and object detection is the basis for this feature.

Fisheye Camera

Fisheye cameras offer a clear advantage for autonomous driving applications, as their extremely wide field of view allows the entire surroundings of the vehicle to be viewed with a minimum of sensors. Typically, only four cameras are required to cover 360°. However, this advantage comes with a cost, given the more complex projection geometry. Several papers in the past have reviewed how to model fisheye geometry, e.g. [34]. We do not intend to repeat this here, but instead focus on the issues that the use of fisheye camera technology brings to autonomous driving vision.

In standard field of view cameras, the principles of rectilinear projection and perspective are very close, with the common perspective property that straight lines in the real world are projected as straight lines on the image plane. Groups of parallel lines are projected as a set of straight lines that converge at a vanishing point on the image plane. Deviations from optical distortion are easily corrected. Many automotive datasets provide image data with optical distortion removed, with simple correction methods, or with barely perceptible optical distortion.

Therefore, most automotive vision research implicitly assumes rectilinear projection, and fisheye perspective is very different from rectilinear perspective. A straight line in the camera scene is projected as a curve on the fisheye image plane, and groups of parallel lines are projected as a set of curves that converge at two vanishing points [38]. However, distortion is not the only effect, as shown in Figure 8 for a typical camera mounted on a mirror in a surround view system. In

a fisheye camera, the orientation of objects in the image depends on their position in the image. In this example, the vehicle on the left is rotated by nearly 90◦ compared to the vehicle on the right, which has an impact on the assumed translation invariance in convolutional methods for object detection. In a standard camera, translation invariance is an acceptable assumption. However, as shown in Figure 8, this is not the case for fisheye images, and careful consideration must be given to how this is handled in any computer vision algorithm design.

f62451c2-6928-11ed-8abf-dac502259ad0.png

The natural way to solve these problems is to correct the image in some way. One can immediately abandon the correction of a single planar image because, firstly, too much of the field of view will inevitably be lost, thus offsetting the advantages of the fisheye image, and secondly, interpolation and perspective artifacts will quickly dominate the corrected output. A common approach is to use multi-plane correction, i.e. different parts of the fisheye image are warped into different planar images. For example, a cube can be defined and the image is warped onto the curved surfaces of the cube. Figure 9 shows the warping on two such surfaces. Even here, interpolation and perspective effects are visible and the complexity of the curved transitions must be handled.

f64980dc-6928-11ed-8abf-dac502259ad0.png

Another correction approach is to consider warping of the cylindrical surface, as shown in Figure 10, where the cylinder axis is configured so that it is perpendicular to the ground. Observations show that most objects of interest in car scenes lie on a nearly horizontal plane, i.e., the road surface. It is therefore desirable to preserve the horizontal field of view while allowing some sacrifice in the vertical field of view, which leads to interesting geometric combinations.

f657339e-6928-11ed-8abf-dac502259ad0.png

Vertically, the projection is done using linear perspective, so vertical lines in the scene are projected as vertical lines in the image. Objects that are farther away or smaller in the image appear visually similar to those in a perspective camera, and it has even been suggested that with this deformation, it is possible to train networks using standard perspective cameras and use them directly on fisheye images without training [39]. However, in the horizontal direction, there is distortion in the new images, with large, close-up objects showing strong distortion, sometimes even greater than in the original fisheye images.

[1] [2] [3] [4]
Reference address:Panoramic/fisheye camera close-range perception for low-speed autonomous driving

Previous article:Disassembly of the car's keyless system: the internal circuit principle of the remote control
Next article:Introducing the use of GPIO in automotive applications

Latest Embedded Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号