What are the other obstacles to the implementation of autonomous driving engineering? Macro design

Latest update time：2022-07-05

Reads：

This article mainly analyzes the key and difficult points in the engineering implementation of the entire autonomous driving development from a macro level, which involves the overall V model development issues and challenges in the industry, the key and difficult points in scene design in system design, and how the system design front-end combines the functional safety team analysis. Detailed safety design is carried out during the process. The entire design process is interlocked and closely integrated in the left half of the initial development.

Issues and challenges in V model development

The V model in the automotive software development process has long been a common model for people in the industry. The core idea is to support and manage the entire development process through the aSPICE process, including archiving and planning. Looking at the entire V model, every process from requirements to source code has corresponding tests.

Focusing on the left side of system to software development, the development capabilities at each stage that need to be achieved according to the "V-model" instructions are as follows.

Using ASPICE can achieve agile development that is currently popular in the automotive field. The process is a flexible approach that starts with fewer requirements and is implemented throughout the V-Model process. So, how to adapt a suitable V or V-like development model from an agile perspective? In fact, it is relatively simple to use multi-V local combination. Because the core of agile development is to implement multi-stage verification testing during the development and verification process. Therefore, the transformation and optimization model for the V model combined with agile development should be a multi-W model or multi-V merger. According to the V-model in the picture above, there are some challenging issues in the development and testing of autonomous driving.

Leaving aside ASPICE, a multi-level assessment mechanism can be implemented by simply evaluating the overall development capabilities from the entire development process. Referring to a hierarchical mechanism similar to that of an autonomous driving system, the entire process assessment mechanism is deployed in stages and hierarchies to achieve different process capabilities. The detailed process includes the following:

In order to solve these problems, the industry has currently proposed some available methods, including: a phased deployment method through sequentially relaxed operating scenarios; using a "monitor-driven" structure to separate the most complex autonomous functions from simpler safety functions ;Fault injection for more effective edge case testing. Overly complex requirements such as driver absence situations can be implemented using non-deterministic algorithms, inductive learning algorithms, fault injection, and safe/operable operations.

Customization to customer needs - extreme scene detection and determination of difficulties

For the V model development mentioned in the first part, the left half is mainly about identifying customer needs and customizing the development of functional scenarios that suit customer needs. We know that the next battlefield for autonomous driving is the effective detection of extreme scenes/edge scenes (Corner Cases). Its core is to detect unknown expectations or unknown scene conditions. But even today with so many sensors, the autonomous driving system still has many problems in adapting its detection capabilities to the entire function.

Scene classification	Working conditions	Description of key points of the scene	Explanation of difficulties	Suggested solutions
city road scene	Unprotected left turn	1. Intersection identification (no special lights, give way to vehicles going straight); 2. Convey left-turn intentions to through vehicles in advance; 3. Determine the safe distance for left turns and speed through; 4. After turning to cross, immediately slow down to avoid pedestrians and non-motorized vehicles;	1. The sensor lacks over-the-horizon perception and cannot obtain the blocked execution vehicle in advance (including the distance, speed and deceleration intention of the straight-moving vehicle, etc.); 2. Under steering control, the movement recognition of pedestrians and non-motorized vehicles is limited (including pedestrians’ eyes, waving and deceleration intentions, etc.); 3. Human driving intentions cannot be accurately recognized, resulting in limited decision-making ability;	1. Increase the resolution/point cloud density of the vehicle-side sensor; 2. Integrate roadside sensing capabilities and other vehicle data to optimize bicycle intelligent control;
city road scene	Roundabout control	1. Before entering the roundabout, change to the entrance lane of the roundabout; 2. There are no traffic lights to guide pedestrians and non-motor vehicles when completing zebra crossings; 3. Give way to vehicles traveling on the roundabout before the second yield sign, and wait for the opportunity to drive onto the roundabout road; 4. When driving in a roundabout, choose to drive out of the inner, middle and outer lanes of the roundabout and change lanes in advance;	1. Driving on the inside of the roundabout: the working conditions are simple, but the number of lane changes increases; it poses a greater challenge to trajectory planning and control decision-making; 2. Driving on the outside of the roundabout: The working conditions are complex, and vehicles may enter the traffic jam at any time, but it is easy to drive out of the roundabout; it poses great challenges to close range perception and decision-making control; 3. The vehicle’s intended transmission when exiting the roundabout is restricted;	During the system development process, the time point for changing lanes and driving out was effectively selected, the intention to exit the roundabout was conveyed in a timely manner, and the intention to exit the roundabout was safely and quickly, while paying attention to pedestrians at the zebra crossing;
Highway/expressway scene	ramp control	1. Identify the ramp 2km in advance; 2. Change lanes in advance to the lane closest to the ramp; 3. Slow down to the required speed on the ramp in advance; 4. After decelerating, drive onto the ramp and continue deceleration control in a timely manner; 5. (Optional) Continuous centering control within the ramp, with a certain offset to the inside; 6. When exiting the off-ramp, it will sense the vehicle executing on the forked road in advance and accelerate to the target lane in a timely manner;	1. Preliminary detection and positioning of ramps; 2. Decision-making ability for multi-lane lane changes; (there is a certain time interval between two lane changes in the scene, and the ramp may be missed if there are too many lanes); 3. Ramp speed limit control (it is possible to change lanes while slowing down, and there is a certain risk);	1. Integrate high-precision positioning and navigation data to locate ramp information in advance; 2. Increase visual perception capabilities (such as high-resolution cameras) and detect ramp speed limit information in advance; 3. Preview the lane change trajectory in advance through multi-curve fitting, determine the risk of lane change on the ramp in advance, and plan the ability to enter the ramp in advance;
Highway/expressway scene	merge control	1. Detect signs of narrowing of the road ahead in advance; 2. Detect the road conditions of the side lanes in advance; 3. Initiate lane change control in a timely manner; 4. Speed up/slow down in a timely manner and keep in sync with the merging target lane;	1. The lane width detection distance is insufficient (generally relying on the camera's sensing ability can only detect the lane width about 200m ahead); 2. Merging is a passive lane change. It may not be possible to successfully merge into the target lane until the current lane disappears, and the vehicle is at certain risks;	1. Integrate high-precision positioning maps to improve the detection ability of lane information; 2. Increase visual perception capabilities (such as high-resolution cameras) and detect lane width information in advance;
bad weather	Rain, snow, fog, dust	1. The camera’s detection capability is limited; 2. Lidar detection capability is limited;	1、激光雷达采用近红外光，其传播易受雨雪雾等恶劣天气影响；包括小雨导致激光脉冲散射导致能量衰减，容易导致漏识别；中大雨导致激光脉冲回波能量变强，穿过雨滴脉冲能量变小，容易导致虚假障碍物； 2、摄像头探测容易受到大雨直接影响，造成探测能力受阻而造成模糊等； 3、当雪开始在路面积聚遮挡车道线时，基于车道线的的定位、可通行区域判断将会失效。	1、通过多次回波测量算法过滤雨滴散射后的回波脉冲，甚至通过波形hi别可有效减少误识别； 2、增加雨量传感器、雨刮数据等信息输入识别实际环境雨、雪天气等，减少对激光雷达、摄像头感知数据的置信度，增加对毫米波雷达、高精定位信息的置信度；
不规则路面	坑洼、突起、断头路等	1、通过传感器识别路面状态，并及时发送给控制端； 2、控制端根据实际车身姿态、加减速度、距离等判定适时的风险规避措施；	1、识别路面实际状态是个大难题，因为未曾被训练过，很多奇异工况，系统根本无法理解； 2、即便能够理解实际路况，单纯从系统的角度也无法十分智能的决策哪种措施更好：比如前方有异型障碍物，当识别到后是选择转向还是避撞那种造成的危害更低。	1、前期进行数据闭环，丰富对这类路面的识别逻辑； 2、系统功能设计中，多探索与之相关连系统的控制逻辑优势，实现控制优化；比如结合底盘智能悬架技术，提前识别前方坑洼目标后，提前举升车辆以规避入坑产生的较大颠簸。

对于系统开发层面来讲，通常由于缺乏包含所有类型边缘场景的大规模数据集，各个与自动驾驶测试相关的corner cases检测通常是一个开放世界问题。因为有些CornerCase倾向于关注集中样本特性，比如某一类异常车辆识别、某种异常车道识别、某种异常道路标示识别等等。而有些Conner Case则是更加倾向于从现实世界去推理出未知世界的内容。比如从当前前车急减速工况推测前方可能出现不可知的车祸或类似施工区域的部分。又或者从BEV建图后推测前方是否存在断头路等等。此外，对于城市道路自动驾驶而言还包括了诸如无保护左转、环岛、人车混流、狭窄路端超车、大型路口掉头等复杂工况下得自动驾驶能力。甚至这些能力对于手动驾驶都存在一定问题，当然就会成为整个自动驾驶系统开发的挑战。

目前看来通过无监督方法或者仅在正常样本上训练异常场景的方法是最有效检测边缘场景的方法。过程中依赖于异常场景数据训练过程中需要相对较为复杂、专业的训练集。

以自动驾驶系统相机为例，如下列举了典型的几种检测极端场景过程。整体来说主要包括未知场景、新颖场景、风险场景、目标对象、作用域、像素级/点云别场景等。而针对这些一常场景的检测主要包括预测、重建、自发生成、特征提取、后处理、置信度求解、场景自学习等。

1、像素级/点云级场景分析

对于像素级/点云级这类微观的场景检测方式，通常是针对图像或整个激光成像过程的，其中涉及诸如图像曝光度，点云实时检测，像素损坏/不清晰等等这类微观场景。以修复受损像素所显示的场景为例，前面我们讲到可以通过自学习来进行场景识别，那么怎么做呢，这里可以通过自监督学习来进行像素修复。方法是通过模拟相应像素的可能性，加入同类型像素，并逐个像素进行标注，从而给出损坏像素的位置，与实际位置进行比较。一些情况下，实际位置与根据光流学习得到的预测位置不会完全一致，甚至相反。这时就可以通过语义分割方法来处理检测问题。

提到语义分割，我们通常会想到一个常用的方法就是特征匹配，也就是从所识别到的图像中通过寻找合适的损失函数，将所检测到的感兴趣区域与SoureDomain中的样本进行比较，然后以损失函数最小的代价得出相应的匹配特征值。

2、目标级场景分析

这类检测过程其实很好理解，就是针对之前未检测过的目标和对象场景进行识别（比如我们自动驾驶经常会遇到前方出现异常障碍物，如掉落轮胎、落石、异型车等），这类目标如果依靠纯视觉感知，很多情况会被误识别或者直接漏识别。我们知道，对于场景识别过程不能完全依赖于无止境的丰富极端场景的训练样本来作为场景库。因为对于现实世界而言，永远没办法穷举所有未知场景工况，这时我们就需要注意，是否可以通过将每次出现的未知工况进行自监督学习。在学习过程中，通过贝叶斯置信度得分，求解一个与那些未知目标相关的高不确定性模型，这类模型虽然不具备与场景库中完全的一致性，但是自监督学习结果却是可以为下一步感知输出提供必要的备证。

在智能汽车视觉感知中，我们常用的特征提取很可能无法捕获整个场景的复杂性目标。因此，许多现有方法给出置信度得分或目标重建误差值等手段来区分正常样本和异常样本。上述深度学习过程中得出的置信度得分和误差偏差值可以有效表明模型的不确定性，得分低或偏差值大的目标可以直接定义为异常场景，这类异常上下文的场景工况中通过监督训练可以直接定位到相关异常目标。

3、帧间级场景分析

这里为啥专门提到帧间呢，是因为前面两种情况都是涉及图像级别的检测能力。通过也是在图像中识别ConnerCase的异常目标物。但是，接下来的帧间场景分析则是为了从类似光流的角度出发探讨整体视频流的异常工况，他的范围限定是以时间轴出发的。为什么这么说呢？因为对于整个视频流分析而言单帧并不存在任何异常，但是整体就是极端异常的。

这里举个例子，比如前方急减速车辆，对于单帧图像而言只是识别到前方的一个车辆而已，而对于整个感知而言，则是需要考虑利用帧间图像匹配方式识别出汽车的减速运动趋势（当然如果期间有雷达数据也可以进行融合）。这种减速趋势过于猛烈则说明前方有异常的CornnerCase出现。

这里对于整个视觉感知的帧间异常而言就有多种方式可以介入，首先最常用的还是帧间运动估计（motion estimation）, 其核心是直接通过查找两帧之间相同的像素或分割图像块在两幅图像中的不同位置，然后相减得到相应的向量估计。这个向量乘以投影矩阵的逆就可以认为是现实世界中的车辆运动向量。当然，评估其是否为CornnerCase的方式就是当该向量与前一时刻向量比较出现较大偏差时，肯定就是异常工况出现了。那么我们的感知训练模块可以给出一个策略目标。

It should be noted here that when analyzing some inter-frame scenes to find the Corner Case, unexpected situations may be completely out of control. For example, pedestrians who suddenly run into the street from behind an occluded vehicle (there is a special term in the field of autonomous driving called ghosts) need to identify scene pixels that are occluded or outside the field of view, and such pixels are not included in the previous A frame-by-frame pixel mask. Therefore, the process requires that effective detection can be achieved within a very short time when only pedestrian pixels appear in the frame. This requires not only extreme speed, but also real-time intelligent learning of the abnormal appearance information of the scene, and the ability in this area needs to be greatly improved.

Detecting corner cases includes online and offline methods. Online cases can be used as safety monitoring and warning systems, and offline cases are used in laboratories to develop new visual perception algorithms and select appropriate training and testing data.

How to design a true safety control system

In order to promote the access of their products, OEMs usually have to provide corresponding safety assessment reports and access applications, many of which involve descriptions of the safety of their own products. It mainly includes the following aspects:

The safety redundancy system is actually a system state "that is, when a given control task cannot be completed, it may lead to a risk of collision. For example, for an L3 autonomous driving system, when approaching the ODD exit or an autonomous driving failure occurs, A receptive 'contingency-ready user' should be ready to take over driving tasks where it is necessary to automatically stop the vehicle in its current lane of travel or pull over to a side lane."

In order to effectively analyze the entire safety redundancy system, the OEM will conduct hazard analysis and safety risk assessment through functional safety. This assessment establishes the functional safety level of hardware and the functional safety level of software in the development of the entire system. For highly safety-related functions, specific redundancies are built in so that the failure of these systems does not create an unreasonable safety risk.

For the current service-oriented SOA architecture of autonomous driving systems, from the perspective of functional safety, service discovery will also be designed to support both "dynamic discovery" and "static configuration" to achieve the goal of improving service access during the service access process. Can realize redundant backup function. For example, when radar sensing data and camera sensing data are accessed by two services, "dynamic discovery" is used to realize sensing data fusion, and failures such as "service name error" and "broadcast loss without response" may occur, you must Allow the most basic parts of the service (such as radar basic status data) to be discovered through "static tables".

The service name errors mentioned above also have corresponding requirements in the entire redundant design that meets functional safety requirements. For example, according to different functional security requirements, the "name service" must have at least one backup instance so that when the main service name access fails, the corresponding content service can be accessed through the backup name. And key services must have multiple instances, and the service names, identifiers, methods, events, etc. they provide must be exactly the same.

Here we need to list a few typical examples:

1) fail operational

Taking the fault operation response of the autonomous driving system as an example, this type of design principle actually considers the mode of a pure dual-redundant system. For many OEMs, designing this type of system actually requires fully integrating the six redundant design concepts step by step. Designed into the entire architecture in place. In addition to the communication redundancy, braking redundancy, steering redundancy, and power supply redundancy required by our traditional methods, the most important thing is the complete realization of sensing information redundancy and control command redundancy within the domain controller. In the next generation of high-performance computing platforms, its sensing system involves multiple aspects of information input, each of which is located in different internal chips. For example, for different SOCs, different sensor information needs to be accessed at the same time to ensure that the final sensor The information can ensure that the information required by the sensor can be processed in another SOC in a timely manner when a sensing information processing failure occurs in one of the SOCs. For example, when designing the autonomous driving sensor architecture, it is usually considered that when the main detection sensor (such as a camera) fails in SOC#1, the camera and access point SOC#2 can process and make up for the sensor information in a timely manner. Missing detection. In addition, SOC#3 can also ensure the safe operation of the system through the continuous detection of other complementary sensors (such as lidar). This entire control capability is completely unaffected, and the process will not even be perceived by the driver.

2) Fail-safe/fail-degraded (fail degraded)

Fail-safe is another typical safety redundancy design relative to the failure process. Under this control logic, the system will no longer be able to normally control the continuous operation of the vehicle. It is very likely that not only a certain sensing end cannot provide data normally for planning and control, but more likely that the entire main domain controller has an irreversible problem.

In order to deal with such fail-safe issues, many OEMs adopt established expected functional safety designs at the design end. Currently, for OEMs, SOTIF is still a topic that is difficult to solve perfectly at the beginning of the design. This is because the usual system design does not consider the target scenario carefully, and the system cannot respond correctly to the environment; at the same time, the functional logic arbitration and algorithm are unreasonable, which can also lead to decision-making problems; in addition, the output of the actuator is inconsistent with the ideal If the output deviates and is difficult to control perfectly, it will also cause the entire system to be unsafe. For the analysis and resolution process of the above problems, in order to anticipate the aspects involved in functional safety, it is completely possible to analyze the residual risks of the pre-function, verify the unexpected behavior under known situations, and verify the remaining unknown situations that may lead to unexpected behaviors. . In this way, we can cover known security, solve known insecurity, identify unknown security, and explore unknown insecurity.

3) Safety responsibility design (fail safe)

The safety responsibility here is not only legal, but also covers the ethical level of analysis. Both functional safety and expected functions mentioned earlier take into account the safe operation of electronic and electrical failures, communication levels and basic control logic in the system. So the question is, what if we consider whether it can be controlled correctly? For example, during the operation of the vehicle, if it is detected that the ODD range is exceeded and the driver needs to take over (but the system itself has no relevant faults and is not considered an abnormal failure), then the entire process requires the system to continuously control the automatic driving of the vehicle until the driver is normal. until the vehicle is taken over. So how to complete security control in this entire process? What key requirements will be raised? What capabilities does this safe takeover capability control require the system to possess in advance? These are questions we need to think about. In addition, for intelligent vehicle control, if during driving, in order to avoid a collision with a visible dangerous situation and create another unpredictable collision, (such as recognizing the emergency braking of the vehicle in front, in order to avoid a collision with the vehicle in front , the own car had to brake urgently, but it may cause the car behind to hit the own car. If you turn to change lanes, you may avoid this risk, but it will also cause other lane change risks, and may also lead to unfavorable factors such as missing the intersection. ) So will safety risks, operational efficiency and ethical factors be taken into consideration at the beginning of the design? These are questions worth thinking about.

For the next generation of autonomous driving systems, it is necessary to accurately upgrade the entire ODD/OEDR adaptability. The improvement process includes systematically exposing the relevant non-ODD design issues that cannot be solved in the current intelligent driving system to see what the system takes over. Whether the performance meets the requirements. Systematically train and test the system's ability to monitor and respond to target events under extreme working conditions, and evaluate whether the system's response is correct.

Latest articles about

■Understanding the OSI Model Using Logistics

■Talk about the controversy of the maximum load of 375 kg for new energy vehicles

■What is the car moose test?

■Live Preview | AUTOSAR SOME/IP Technology Interpretation

■AP AUTOSAR Hard-Core Technology (5): Diagnostic Management

■How much does it cost to customize an automotive-grade ECU?

■Live broadcast today | CAN XL International Seminar

■“Customers are not afraid, so what are you afraid of?” - Reflection on the value of static analysis of automotive software

■Detailed explanation of the control algorithm of the electromechanical brake system (EMB) - Taking Tongyu Automobile and Feige Intelligent as examples

■[Opening this week] SAE-AWC 2024 Automotive EEA Innovation Technology Forum | Free registration