Detailed explanation of in-cabin sensing technology
I. Overview
Compared with cameras used in ADAS perception systems, cameras used inside smart cockpits have relatively simple functional features and performance requirements. For example, OMS passenger surveillance cameras generally achieve good results if they reach 5MP. At the same time, OMS can also be used in in-car conference systems and in-car child detection. The DMS driver monitoring system, SVC 360 surround-view camera, and DVR driving recorder are all cameras shared by the autonomous driving domain ADC and the smart cockpit domain CDC. These cameras are briefly described below.
For car cameras, its characteristics lie in the installation position of the camera and the distance between the ISP processing chip. Cameras below 2MP will generally directly output image data in YUV format, without the need for additional ISP for image processing. Above 2MP, the camera will output raw data, and an independent ISP is required to process the camera image format. Due to the large number of cameras used, if each camera is assigned an ISP, the cost will be very huge. Therefore, it is necessary to arrange the ISP inside the CDC cockpit controller, and transmit the raw data of the camera to the centralized ISP for processing through high-speed transmission cables. At this time, it is necessary to use a high-speed video transmission bus.
As described in the high-speed audio and video transmission interface chapter, GMSL or FPDlink is generally used at this time to transmit the raw data of the remote camera to the CDC. Technologies that may be adopted in the future include Mipi-A phy and ASA. At this time, Serdes chips are generally set up in pairs, with a serializer chip integrated on the camera side; a desilizer chip designed on the CDC side; the distance between them can be up to 10 meters, and the longest is no more than 15 meters.
As the bridge chip and transmission data line for Camera transmission, you can refer to the following pictures to understand:
2. OMS
OMS (Occupant Monitor System) is the abbreviation of passenger and rear seat detection system. In terms of regulations, many regions and countries have already enacted legislation requiring in-car child testing. Euro NCAP plans to increase the score for detecting the presence of children in cars from January 2023, and the regulations are very detailed. The United States is legislating to require all new cars to be pre-installed with child presence detection functions, which is expected to be fully implemented in 2025. It is said that relevant domestic regulations are also being evaluated and formulated.
The OMS camera can meet the requirements for detecting the presence of children in the car to a certain extent. In order to improve the accuracy of detection, it is even required to add vital sign detection radar (UWB or millimeter wave radar). Judging from current practice, a 5MP or even 8MP OMS main camera is generally placed at the front rearview mirror; a 2MP rear OMS is added above the second and third rows of seats. Through the application of AI algorithms such as face recognition, motion capture, and live body detection, OMS can improve the perception in smart cockpit cars.
OMS can also provide support for the following application functions: in-car video conferencing systems; secure payment systems; artificial intelligence assistant fusion sensing systems, etc.
In order to ensure normal use under any light intensity in the cockpit environment, OMS is generally required to support RGB-IR, visible light and infrared light dual-band working modes, and at the same time, infrared fill lights need to be arranged. IR infrared light can be used for detection at night; RGB visible light can be used for work during the day.
2.1 TOF
TOF is the abbreviation of Time of flight, which literally means flight time. The so-called time-of-flight 3D imaging is to continuously send light pulses to the target, and then use a sensor to receive the light returned from the object, and obtain the target distance by detecting the flight (round-trip) time of the light pulse. This technology is basically similar to the 3D laser sensor in principle, except that the 3D laser sensor scans point by point, while the TOF camera obtains the depth information of the entire image at the same time. TOF cameras are also similar to ordinary machine vision imaging processes. They are composed of several units such as light sources, optical components, sensors, control circuits, and processing circuits. Compared with binocular measurement systems, which are non-embedded 3D detection and have very similar applicable fields, TOF cameras have a fundamentally different 3D imaging mechanism. Binocular stereo measurement matches the left and right stereo images and then performs stereo detection through triangulation, while the TOF camera obtains the target distance through incident and reflected light detection.
Since TOF uses multi-point emission and measurement methods to calculate and obtain depth information, it is characterized by low resolution and high power consumption. Limited by the laser emission point, its resolution generally does not exceed 640*480, but its power consumption is more than ten times that of structured light.
2.2 Binocular structured light+RGB
Binocular structured light uses binocular stereo vision technology, which imitates the left and right parallax of the human eye. It uses two infrared cameras to obtain two images of the measured object from the left and right directions, and then calculates the deviation between the corresponding points of the images. , the disparity map is calculated using triangulation, and then converted into 3D depth information. The structured light here means that the infrared projector will project a simple infrared light lattice to enhance the texture of the object surface, and then the IR camera will capture the object image and use the algorithm for post-processing calculations.
Four basic steps of a binocular stereo vision system:
1 Camera calibration: It mainly includes two parts: the internal parameter calibration of a single camera and the external parameter calibration of a binocular camera. The former can obtain the focal length, optical center, distortion coefficient and other parameters of each camera, and the latter can obtain the parameters between the binocular cameras. The rotation and translation relationship of the camera coordinate system.
2 Stereo correction: This process is to correct the original images collected by the two cameras based on the calibration results of the cameras. The two corrected images are located on the same plane and parallel to each other, that is, the pixels in each row of the image are collinear.
3 Stereo matching: This process is to match pixel points based on the stereo corrected images. The successfully matched points represent the different positions of a certain point in the real world in the two images.
4. Depth calculation: For the two images obtained after stereo matching, use triangulation method to calculate the disparity of each pixel one by one. After obtaining the disparity map, it is then converted into a depth map.
The advantage of binocular is that it requires low hardware resources. Just two ordinary IR cameras and a SOC chip with DSP can calculate and obtain the depth map. But its disadvantage is that it must be equipped with binoculars and requires a certain baseline length, so its installation environment is limited.
2.3 Monocular structured light
Structured light is called Structured light in English. The basic principle is to project light with certain structural characteristics onto the object being photographed through a near-infrared laser, which is then collected by a specialized infrared camera. This kind of light with a certain structure (laser speckle) will collect different image phase information due to different depth areas of the subject, and then convert the changes in this structure into depth information through the computing unit to obtain three-dimensional structure. To put it simply, the three-dimensional structure of the photographed object is obtained through optical means, and then the obtained information is applied in more depth. Usually, an invisible infrared laser of a specific wavelength is used as the light source. The light emitted by it is projected on the object through a certain encoding. The distortion of the returned encoding pattern is calculated through a certain algorithm to obtain the position and depth information of the object.
Compared with TOF, structured light has lower power consumption because it only needs to project a small area to meet the requirements. Its resolution and accuracy are higher than TOF, and its cost is lower than TOF. Compared with binocular structured light, the baseline of monocular structured light can be made smaller, making it easier to arrange in the car environment. And it can still be used in low light conditions, making it more suitable for the interior environment of the car.
2.4 OMS vision solution
In the perception system inside the smart cockpit, OMS plays a huge role and advantage. In addition to speech recognition, multi-modal recognition has increasing demands on visual perception. Among them, functions such as gesture recognition, facial expression recognition, emotion recognition, and spoken language recognition that add depth information will give the in-vehicle AI assistant higher intelligence, improve the user experience of the smart cockpit, and provide a good experience.
Comparing the above sensing camera solutions, except for the monocular RGB+IR without depth information, the other three are options for depth cameras.
双目结构光+RGB方案,最大的问题在于实现算法需要很高的计算资源,导致实时性很差,而且基本跟分辨率,检测精度挂钩。也就是说,分辨率越高,要求精度越高,则计算越复杂,同时,纯双目方案受光照,物体纹理性质影响。所增加的RGB摄像头只有1路,主要用于与深度图的对比显示。
TOF方案,由于硬件条件限制,其分辨率较低,感知精度低于结构光和双目立体方案。需要多点激光发射,硬件成本高,但是它的算法复杂度低,实时性高,可达120帧率,并且硬件计算资源需求较低。
单目结构光方案,目的就是为了解决双目中匹配算法的复杂度和鲁棒性问题而提出,该方案解决了大多数环境下双目的问题。但是,在强光下,结构光核心技术激光散斑会被淹没,因此,在阳光强烈的时候,可能会失效。
三、DMS
DMS(Driver Monitor System)驾驶员监测系统的简称。指在驾驶员行驶过程中,全天候监测驾驶员的疲劳状态,危险驾驶行为等。在发现驾驶员出现疲劳,打哈欠,眯眼睛,抽烟,接打手持电话等行为后,DMS系统及时的对此类行为进行分析,并进行语音和灯光等提示,起到警示驾驶员,纠正错误驾驶行为的作用。
由于DMS主要用于驾驶员异常行为的监测,因此它主要属于ADC自动驾驶域,而不属于CDC智能座舱域。但是DMS一般安装在舱内A柱下方,直接面对驾驶员的面部,因此也可以算入智能座舱内部的摄像头之一。
DMS一般采用2MP的红外摄像头,无需采用RGB工作模式。它所拍摄的画面,需要让“机器”能够看得清,让AI算法对驾驶员的状态分析准确;而无需让“人”看得舒服。因此,DMS只要求到2MP的像素分辨率即可,并且需要红外补光摄像头,保证在任何光照条件下都能够让机器看清楚驾驶员的面部特征。
需要注意的是,DMS和OMS都具有红外补光灯,因此需要设计专门的红外灯同步信号,确保DMS和OMS的补光灯不能同时工作,以避免产生过曝现象。
DMS感知算法的进一步提升,还包括眼动跟踪,面部表情,情绪监测等。与人工智能助手相配合的多模态识别,还将包含口型检测等进一步的AI识别算法得到应用。
四、行车记录仪
车载DVR,即Digital Video Recorder,也正是行车记录仪。在DVR的功能当中,分为车载前装DVR和后装DVR。后装DVR一般是独立的行车记录仪设备,它使用音视频编码技术,将DVR自带的摄像头数据进行转换和压缩,保存在DVR的存储设备中。由于后装设备无需满足车规标准,在汽车出厂时也不包含此设备,需要在汽车配件市场进行后期安装,因此它可以采用消费级的电子芯片,其可靠性也远远达不到车规级的要求。
前装DVR则需要满足车规级标准,在汽车出厂时就已经安装完毕,它的使用年限和可靠性都需要按车规电子的标准来要求。
通常,前装DVR可以无需设置单独的摄像头,而是直接复用ADAS自动驾驶域的摄像头即可。一般来说,DVR可以保存前向广域摄像头(FOV达到120°)+SVC 360环视摄像头的图像数据。为了满足白天+黑夜的数据记录要求,以及满足进出隧道等摄像头高动态范围识别要求,前向DVR摄像头和SVC环视都要求满足HDR(High Dynamic Range)的要求。因此,必须要求摄像头支持HDR模式,ISP也要求支持HDR模式。
五、360环视
SVC(Surround View Cameras)即环视摄像头,它一般布局在车头(前向),车尾(后向),左后视镜(左向),右后视镜(右向)。SVC是多摄像头系统,允许驾驶员拥有360度视野,实时查看车辆周边环境。此系统通过显示图像合成算法,将多个摄像头的视角融合,得到在高处环视车辆的“上帝视角”。
SVC 同样主要属于ADC自动驾驶域,因为对于泊车辅助系统来说,需要SVC摄像头来帮助感知泊车的停车位和周边环境。因此,SVC 360环视摄像头也被称为Parking Assistance Camera。
SVC摄像头具有如下几个特点:
六、流媒体后视镜
6.1 法规
CMS(Camera Monitor System)即流媒体后视镜。CMS是用电子方式取代传统的玻璃镜面倒车镜,它有很多个名字,有叫电子侧视镜,虚拟倒车镜,电子倒车镜, 电子取代镜等,ISO 国际标准组织称其为摄像头监控系统,即 Camera Monitor System。
奥迪 e-tron 在 CMS 系统搭载了两个 7 英寸、1280x1080 的 OLED 屏幕。据称是 OLED 屏响应速度快大约 100 毫秒。奥迪表 示,外侧摄像头系统有助于将阻力系数从美国版的 0.28 提高到 欧洲版的 0.27。对于纯电动汽车来说,在高速公路上,这个微小的差别可以增加 3 英里的续航里程。这个非规则形状的 OLED 屏成本极高,且角度略低,容易导致驾驶员分心,下一代奥迪很有可能改回标准矩形,放在 A 柱附近。
目前全球只有日本和欧洲的法规允许使用电子后视镜系统代替玻璃视镜。欧洲法规方面主要有 UN ECE R46-2016《关于间接视野装置及安装间接视野装置车辆认证的统一规定》和 ISO 16505-2019《摄像头监视系统的人体工程学和性能方面的要求和试验程序》 。还有一个法规 IEEEP 2020 Standard for Automotive System Image Quality(车载相机图像质量标准), IEEEP2020 希望规范是车上的所有的摄像头图像质量相关的测试和问题。无论是人类视觉应用,还是计算机视觉应用都在其范畴。并且其主要规范的就是摄像头成像系统的图像质量。
目前CMS还需要专用的摄像头(HDR),传输通道,以及显示屏。摄像头分辨率和帧率一般最高只能达到2MP 60fps或者4MP 30fps,且成本相当高昂,在实用性上还有一定的差距。
最新的中国国标GB 15084-2022已经于2023-07-01生效,允许汽车安装流媒体后视镜,其中包括各种I,II,III类镜。 一图读懂强制性国家标准GB 15084-2022《机动车辆 间接视野装置 性能和安装要求》
6.2 CMS性能要求
对于乘用车来说,首先需要区分I类镜和III类镜的区别,这个十分重要。
I类镜,也称为电子内后视镜,它主要是利用后置摄像头,将Camera拍摄到的视频流传输到车内中央的后视镜上进行显示。而III类镜,也称为电子外后视镜,它主要用于替代车身左右侧外部后视镜,将安装在车身两侧向后观察位置的摄像头视频流显示在车内显示屏上。
在GB15084-2022中,对CMS的性能提出了非常具体的要求,并提出了检测标准:
-
亮度调节:监视器的亮度应能根据环境条件手动或自动调节。
-
方向均匀性:为了确保驾驶员在预期的不同方向观察显示屏时,显示屏具有足够的可见性,限制从不同方向观察的图像亮度衰减,并规定了方向均匀性。
-
Brightness and contrast reproduction: In order to ensure the image quality of CMS under different use environment conditions and the discernible view outside the vehicle, the requirements for brightness and contrast reproduction are stipulated, and the conditions for direct sunlight, scattered ambient light, sunset conditions, night conditions, etc. 4 are selected respectively. to evaluate typical scenarios.
-
Grayscale and color reproduction: CMS should be able to display at least 8 different grayscale levels on the display screen; in terms of color reproduction, the test should be carried out according to the methods specified in ISO16505 and meet the specified requirements.
-
Diffusion: In order to prevent strong light sources from irradiating the camera lens and causing ray-like bright bars on the display screen to interfere with the driver, the diffusion requirements are stipulated and the diffusion brightness value is limited to no more than 10% of the maximum brightness value of the light source image causing the diffusion.
-
Halo and glare: Test according to the method specified in ISO16505. The halo and lens glare area should not be larger than 25% of the displayed image area.
-
Point light source: In order for the driver to clearly distinguish the two headlights of the motor vehicle behind the car when driving at night, the requirements for point light sources are stipulated. The point light source discovery coefficient should not be less than 2.7, or the point light source contrast coefficient should not be less than 0.12.
-
Sharpness, depth of field, and geometric distortion: In order to enable the driver to recognize targets in the field of view outside the vehicle, test standards for sharpness, depth of field, and geometric distortion need to be specified and verified according to the methods specified in ISO16505.
-
Frame rate: CMS has a frame rate of at least 30fps, which can be reduced to 15fps in low-light conditions or when the vehicle is traveling at low speeds.
-
Imaging time and system delay: The display imaging time should be less than 55ms. The time from when an event outside the car occurs to the time the in-car monitor outputs the image is the system delay, which should not be less than 200ms.
According to the above-mentioned national test standards, in order for CMS to meet the requirements of automobile front-end equipment, it needs to conduct system analysis from a series of aspects such as cameras, control chips, ISP processing, display screens, etc., and integrate software and hardware in terms of optics, mechanics, and electronics. Only with an integrated design can you get satisfactory results.
6.3 CMS system architecture
The CMS system has two configurations: "MCU solution" and "SOC solution" - the former has simple functions, low price, and low delay; the latter has rich functions, high price, and high delay. The core processing task of the CMS system is ISP. If you only use the images collected by the CMS camera for display, you only need an MCU. In addition, you can add some application layer functions (such as BSD, door opening warning and other functions), but SOC needs to be added. The former MCU solution has simple functions, saves the cost of the SOC core board, and is cheaper; the latter SOC solution is more expensive, and due to the addition of some new functional processes, the system delay is also higher than the former. In the MCU solution, ISP processing can be placed in the display screen (the method preferred by the screen manufacturer Tier1) or in the camera (the method preferred by the camera Tier1). For the MCU solution, the processor responsible for the ISP of the CMS system can be placed in the screen or in the camera. This solution does not have an independent CMS controller. For the SOC solution, it can be placed in an independent CMS controller, and it can also be integrated into an intelligent driving domain controller or intelligent cockpit domain controller in the future.
The above different solutions correspond to different system costs. Let’s briefly introduce two different system architecture solutions :
1. The processing chip ISP is integrated into the screen and separated from the camera module: The display supplier hopes to include the entire processing in the screen, that is, to embed the module based on the display as the processing core into the screen board to process the front-end camera The transmitted image information is used to match the modular design of the entire screen. .
2. The processing chip ISP and the camera are integrated in the outer ear and separated from the in-cabin display: The otoscope, that is, the camera supplier, hopes to embed the processor into the otoscopes on both sides to adapt to manufacturers with different screens in the cabin. This solution can make the entire system miniaturized, and the external otoscope end device can be packaged in a small package in a long strip or semicircular otoscope module on both sides. The Rawdata collected by the camera can be directly processed by ISP at the otoscope end and then displayed on the in-cabin display.
3. The processing chip ISP is integrated in the smart cockpit domain controller CDC, and the camera is reused: In this solution, the central computing platform will be used as the center, and the powerful ISP processing capabilities of the smart cockpit SOC chip will be used to reuse the raw input of the vehicle camera. raw data to achieve cost optimization.
In this solution, the camera is the original vehicle-mounted ADAS domain camera, the ISP uses the smart cockpit SOC in the central computing platform, and the new device is only a display screen. It can be seen that the cost is the best among the three solutions. However, this solution is a challenge to the functional safety requirements required for Class III mirrors, so it has not yet reached the level of immediate commercial use and can be studied as a direction for future development.
references
-
Depth camera—TOF, RGB binocular, structured light principles and advantages comparison
-
Analysis of the automotive electronic rearview mirror industry: regulations are implemented, and the century-old revolution in automotive rearview mirrors officially begins
-
Severe challenges after opening the electronic rearview mirror