SoC chip selection for intelligent driving domain controller-EEWORLD

Collect

Usually, the TOPS advertised by major chip manufacturers are often the theoretical values of the computing unit, rather than the actual value of the entire hardware system. In actual operation, the real effective computing power may be only 30% of the theoretical value, or even lower. This involves the concept of "computing power utilization". For example, the theoretical computing power required by a neural network model is 1TOPS, and the nominal computing power of the actual running SoC is 4TOPS, so the utilization rate is only 25%. The following is a comparison table of the computing power of Tesla, Mobileye, NVIDIA, Huawei, and Horizon chips.

Taking the running data of ResNet-50 and MobileNet V1 networks on SoC A and SoC B as an example, the actual effective computing power will vary due to differences in image resolution, network structure, etc.

What is the reason for this? Generally speaking, the actual effective computing power is mainly affected by two aspects:

1) Processor computing architecture: As can be seen from the table above, even for the same SoC, the utilization of different network structures varies greatly. This is because the deep learning accelerator itself is a highly customized computing architecture, and only the network structure that matches the execution and accelerator characteristics can achieve a high utilization rate.

2) Storage bandwidth: Storage bandwidth determines the speed of data transfer. If the storage bandwidth cannot keep up with the computing speed, the data cannot reach the computing unit in time, resulting in the processor's computing unit being idle, which greatly reduces the processor's computing power utilization. The processing scenarios of smart driving applications usually have the characteristics of large image resolution, small parallel sample size (batch size), and small network structure, which usually requires higher storage bandwidth.

For the power index of a car, horsepower is not as accurate as the acceleration time from 0 to 100 km/h to truly reflect the power performance of the whole vehicle; similarly, effective computing power can better reflect the actual performance of the chip than theoretical computing power. Therefore, when selecting SoC, you need to focus on the effective computing power that the entire SoC system can provide.

2.4. Diversified computing power needs. On the deep learning inference side, each chip manufacturer often designs the corresponding NN processor based on its own neural network inference framework. Various TPU/NPU/DPU... emerge in an endless stream. Chip manufacturers customize the design of processors based on the characteristics of neural networks to make the hardware and software more adaptable, thereby improving the utilization of chip computing power.

In the market, in addition to NN processors, automotive chips from companies such as Qualcomm and Texas Instruments are equipped with general-purpose computing processors such as GPU/DSP/CV accelerators on SoC to improve the processing power of automotive chips and the scalability of algorithm development.

In intelligent driving systems, most of the calculations can be done by deep learning processors. However, for some companies with strong algorithm development capabilities, they will design their own neural network structure according to the actual business scenario requirements. The operator library of the NN processor provided by chip manufacturers cannot meet their needs, and there are often some custom operators developed. In addition, functions such as ISP, multi-sensor fusion, positioning and mapping will also involve the implementation of some non-deep learning visual algorithms. At this time, the GPU/DSP/CV accelerator on the on-board chip will be able to well supplement this part of the computing power demand.

DSP can provide low-power vector processing capabilities. Compared with CPU, DSP's SIMD instructions can be used to deal with algorithms with high parallelism and good data continuity. For algorithms with high parallelism but poor data continuity, if they are deployed on DSP, it will bring great challenges to IO bandwidth and cannot give full play to the computing power of DSP. However, the high concurrency of GPU can deal with such algorithms well. At the same time, the image processing capabilities of GPU can meet the rendering and visualization requirements in intelligent driving scenarios.

In summary, when selecting a SoC, it is necessary to reasonably plan and allocate computing power based on business needs to achieve coordinated and efficient cooperation among the various modules of the SoC, rather than focusing only on deep learning computing power.

3. Security

3.1. Cybersecurity: With the release of UNECE WP29 R155 regulations and ISO/SAE 21434 standards, China has also issued a series of national standards and regulations related to vehicle-mounted cybersecurity, including those related to cybersecurity technology, processes, data protection, etc. All this shows that the importance of cybersecurity in the intelligent connected vehicle industry is increasing day by day.

The implementation of network security mechanisms requires defense in depth. The upper layer includes service-oriented application firewalls, authentication and authorization of service access, etc. The middle layer includes operating system process access rights management, file system encryption, Ethernet firewall, secure communication, debugging interface control, security audit, etc. The bottom layer includes basic functions such as secure boot, secure upgrade, secure storage, and key management. When selecting chips, the following aspects are often considered regarding network security:

Chip packaging: Try to choose chips with BGA packaging.
The chip's ability to defend against channel attacks. Currently, many side-channel attack methods can easily obtain key assets of the chip during operation, such as keys.
The chip debugging interface, such as JTAG, can be permanently closed through a hardware mechanism, or the chip debugging interface can be switched on and off through a software security mechanism.
Secure boot of the chip. Secure boot usually starts from the BootRom of the chip, and verifies the signature of the firmware to prevent the firmware from being maliciously tampered with, thus ensuring the integrity of the firmware.
The secure operating environment of the chip. This operating environment is mainly used to manage key assets during chip operation, such as the chip's security configuration and keys, and to implement security algorithm acceleration services through hardware.
The memory protection unit of the chip, such as MMU or MPU, is generally integrated into the processor and configured by the operating system running on the processor to achieve address virtualization and data isolation of the running kernel/process/thread.
The chip's unique SN is usually used for security services such as binding and authentication.

In addition to the above technical requirements, when selecting chips, you also need to consider the supplier's network security qualifications, such as whether there is a CSMS management system.

3.2. Functional Safety (FuSa) It is well known that "intelligent driving, safety first". As the core of the intelligent driving controller, the safety performance of SoC is the key to ensure the final delivery of safe products. Therefore, in the design and selection of SoC chips, functional safety must be evaluated as a core indicator:

Whether the functional safety integrity level (ASIL) supported by the SoC chip meets the safety level requirements of the final product;
Whether the safety design of the SoC chip matches the functional safety concept of the current product;
Whether the SoC chip fully considers the application of products with different driving automation levels;

To achieve the above goals, a comprehensive assessment of the functional safety design and development capabilities of SoC suppliers is also required:

Evaluate the safety design concept of SoC, including safety requirements, safety status, fault tolerance time interval, etc.
Evaluate the safety mechanism design of SoC, including diagnostic mechanism, self-check mechanism, safety isolation and redundancy design;
Evaluate the safety analysis results of SoC, including qualitative safety analysis, quantitative safety analysis and related failure analysis results;
Check the identification report of the SoC development tool chain, including the confidence assessment results of the tool software, the software tool development process assessment, etc.
Check the SoC-related security audit, certification and assessment results provided by the manufacturer, including whether it is an independent third-party audit and assessment, the assessment scope, assessment report, etc.;

The level of functional safety is related to the functional safety goals of the SoC. When evaluating, it is necessary to subdivide the functional safety level of each module within the SoC, and confirm from the software and hardware dimensions whether the functional safety design of the SoC can fully and effectively meet the safety requirements of its own products. At the product application level, it is also necessary to comprehensively evaluate the potential changes in SoC computing power requirements, communication bandwidth requirements, storage capacity requirements, etc. after the introduction of functional safety design in the product, to ensure that the SoC safety function design can be fully implemented in the project.

4. Others

4.1. Memory Bandwidth In addition to executing instructions, the CPU, NN accelerator, GPU, etc. inside the SoC also read instructions and read and write data from DDR. However, DDR access cannot be completed in a single cycle, and the typical access delay is 100ns+. Although Cache can alleviate the access delay problem of DDR to a certain extent, considering the multi-core concurrency and random access to DDR, DDR bandwidth often becomes a bottleneck for the operation of CPU and various accelerators. For example, assuming that the NN accelerator processes a frame of image, 50ms is used for loading and storing DDR data, and 50ms is used for data calculation, and the frame rate is 10Hz; if the DDR bandwidth is halved, 100ms is required for loading and storing DDR data, and 50ms is used for data calculation, and the frame rate is 6.7Hz. It can be seen that DDR bandwidth can indirectly affect the efficiency of the operation of each processor and accelerator.