How NVIDIA series chips are used in the architecture and safety design of autonomous driving research and development (1)
Latest update time:2023-03-29
Reads:
NVIDIA DRIVE AGX is a scalable, open autonomous vehicle computing platform that serves as the brain of autonomous vehicles.
As the leader in hardware platforms in its category, NVIDIA DRIVE AGX provides high-performance, energy-efficient computing for functionally safe artificial intelligence autonomous driving.
On the hardware side, the NVIDIA DRIVE embedded supercomputing platform processes data from cameras, radar and lidar sensors to sense the surrounding environment, determine the car's location on a map, and then plan and execute safe driving routes.
In terms of software, NVIDIA DRIVE AGX has scalable and software-defined features, and the platform can provide advanced performance to help autonomous vehicles process large amounts of sensor data and make real-time driving decisions.
The open NVIDIA DRIVE software stack also helps developers build perception, mapping, planning and driver monitoring capabilities using redundant and diverse deep neural networks (DNNs).
Through continuous iteration and over-the-air updates, the platform becomes increasingly powerful.
At the same time, the open NVIDIA DRIVE SDK provides developers with all the building blocks and algorithm stacks needed for autonomous driving.
The software helps developers more efficiently build and deploy a variety of advanced autonomous driving applications, including perception, localization and mapping, planning and control, driver monitoring and natural language processing.
This article will be divided into several chapters, taking Orin x, the most widely used mainstream NVIDIA chip, as an example to explain how to develop and apply from software to hardware levels from both hardware and software directions.
NVIDIA internal architecture design
Taking Orin-x as an example, the CPU includes a main CPU complex based on Arm Cortex-A78AE, which provides general high-speed computing capabilities; and a Functional Safety Island (FSI) based on Arm Cortex-R52, which provides isolated on-chip computing resources. , reducing the need for external ASIL D functional safety CPU processing.
The GPU is NVIDIA® Ampere GPU, which provides advanced parallel processing computing capabilities for the CUDA language and supports a variety of tools, such as TensorRT, a deep learning inference optimizer and runtime that provides low latency and high throughput. Ampere also offers state-of-the-art graphics capabilities, including real-time ray tracing. Domain-specific hardware accelerators (DSA) are a set of specialized hardware engines designed to offload various computing tasks from the compute engine and perform these tasks with high throughput and energy efficiency.
The figure below shows the high-level architecture of the SoC, divided into three main processing complexes: CPU, GPU, and hardware accelerators.
The internal architecture design of the entire chip is mainly divided into functional designs based on blocks. Including operating system underlying software QNX BSP (clock source & system restart, CAN/SPI/I2C/GPIO/UART controller, configuration register, system configuration), real-time operating system QNX RTOS, Nv multimedia processing module (sensor processing module MCU ( R5), PVA, DLA, Audio Processor, MCU R5 configuration real-time camera input), classic Autosar processing module (for Safety Island Lock-Step R52s), safety service Safety Service (ARM Cotex-A78AE CPU Complex, CPU Switch fabric Coherent, Information security PSC), neural network processing module (CUDA & TensorRT).
Typical architecture design of autonomous driving based on NVIDIA chips
Conventional SOC system architecture is usually designed with conventional SOC+MCU dual-chip or even three-chip approach. Due to its advantages in computing performance, SOC generally has better computing application scenarios than MCU in front-end sensing and planning.
Due to its high functional safety level, MCU can be used as a verification output for control execution. The industry has always had mixed opinions on whether Nvidia chips can simply serve as ultra-heterogeneous chips like TDA4 and undertake independent tasks. In principle, whether it is the Xavier or Orin series, NVIDIA series chip designs have both rich AI and CPU computing capabilities. Considering the development of autonomous driving systems above the L2+ level, this capability can be fully adapted to the entire solution design.
So, has the industry promoted corresponding design solutions in this way? The answer is no.
In the latest security requirements in the NVIDIA Datasheet, in the architectural design recommendations for the Orin series chips, it is still necessary to use specific MCUs for failure analysis and risk assessment, so that serious faults in the system can be located in a timely manner to ensure that the Autonomous driving safety integrity capability requirements are defined by the ISO26262 standard (this will be explained separately later). At the same time, considering the power management of the entire domain control, accessing an external MCU can also greatly improve its power management capabilities, including entering and exiting sleep mode, etc.
The MCU set up above can also be called Safe MCU (SMCU) to a certain extent. In the process of developing the system, some MCUs with higher safety levels need to be used (generally need to reach ASIL D level), such as Infineon Aurix TC series , Renesas' RH850 series can serve as MCU to implement SMCU access to Orin. Such an SMCU can actually serve as power control for the entire system development and avoidance of serious failures.
As shown in the figure above, it represents a three-layer fail-safe framework based on NVIDIA chip design system architecture. Overall, this architecture implements three levels of fail-safe protection from the SOC layer's basic services, operating system, virtual machine, implementation runtime environment, to the MCU's real-time runtime environment. The SOC layer and MCU layer perform health and independent watchdog monitoring at the NvIVC, NvIPC, and SPI/Error Pin levels respectively. Among them, the SOC itself will carry a part of the lockstep security verification Lockstep FSI, and run virtual on the core CPU complex CCPLEX (Carmel CPU complex running the capture stack and applications. Represents the Carmel CPU complex running the capture stack and applications) The machine hypervisor uses the QNX operating system with high functional security level on the CPU core to complete resource scheduling of application software watchdog, middleware, application layer software and driver software. Of course, for the real-time running system, it still runs on standard Autosar.
The security architecture shown in the figure below represents how the external MCU supports the boot data flow on the SOC and performs an effective secure boot through a standard error reporting/propagation data flow. Among them, the entire program and data boot loading process includes three levels: Boot L1 level CCPLEX, Boot L2 level FSI, and Boot L3 level External MCU.
Start link design
In the L1 level program startup process, the low-level startup includes using the "Boot and Power Management Processor (BPMP)" terminal (a smaller ARM core located at the core of the system) to load the low-level startup program to the BPMP server, which is controlled by the virtual machine hypervisor or The operating system Safety OS calls the corresponding startup program file. In general, BPMP's Cortex-R5 can achieve:
1. Lock-step core pairing
2. Arm 7-R ISA
3. Vector interrupt support: based on daisy chain Arm PL192 vector interrupt controller (AVIC)
4. TCM interface for local SRAM
5. Complete instruction and data cache (which involves 32KB instruction cache I-Cache and 32KB data cache D-Cache)
6. Arm processor correction
At the same time, the underlying iGPU core will also be driven by the RM integrated server. Finally, various professional tasks such as operating system task scheduling, boot management program loading, and RM server driver GPU core are completed in the first layer loading boot program L1 CCPLEX (called CPU Complex in NVIDIA, which is a high-performance 64-bit Arm core).
In addition, the L2 level mainly involves the functional safety island verification FSI mentioned in the previous article. This will be explained separately later in the article.
Finally, the plug-in SMCU can provide an additional layer of security protection and startup management configuration. In this way, the entire chip can be completely driven from a safety perspective.
Functional safety island design principles
Figure 2 shows how to load FSI and underlying related module driver boot programs in NVIDIA series chips. In terms of functional safety design of NVIDIA series chips, the Orin series achieves ASIL D system capability design and ASIL B/D random error management capability design by setting goals. Including ASIL decomposition requirements based on SOC chip hardware to each core, ensuring that the design consistency between cores can meet ASIL D requirements, and applying the standard ASIL D development process to the entire functional safety design, from the bottom up to the safety process, Drive Corresponding security designs are carried out in AGX, operating system Drive OS, Drive Work, sensors, redundant architecture design, and security policies.
The Functional Safety Island (FSI) of the NVIDIA series of chips is a core of the processor cluster that includes the Cortex-R52 and Cortex-R5F real processors and has a dedicated I/O controller. For example, the FSI module in Orin-X has its own voltage rail, oscillator, PLL, and SRAM to ensure minimal interaction with other modules inside the SOC and to achieve no interference between the above modules.
Orin-x Series FSI capabilities include:
The Cortex-R52 processor, also known as safety CPU, has 4 cores (total 8 physical cores) in DCLS (dual-core lock-step) mode and can run the classic AUTOSAR operating system to implement error handling, system fault handling and other customer workload, the overall performance is about 10KDMIPs.
The Cortex-R5F processor, also known as the Cryptographic Hardware Security Module (CHSM), is used to run cryptographic and security use cases such as Secure On-Board Communications (SecOC) over the CAN interface.
The entire FSI mechanism generally includes the following security instructions and control interface information:
1. Security and tightly coupled memory, instruction and data caches for each core of the CHSM CPU.
2. There is a total of 5MB of on-chip dedicated RAM on the safety island to ensure that code execution and data storage can remain within the FSI.
3. There are dedicated I/O interfaces on the island dedicated to communicating with external components. Contains 1 UART and 4 GPIO ports.
4. Hardware security mechanisms, such as DLS, CRC, ECC, parity check, timeout, etc. for all IPs in FSI.
Dedicated thermal, voltage and frequency monitors.
5. Logical isolation to ensure sufficient error recovery time FFI from other parts of the SoC.
FSI example analysis 1:
Here, this article will use an example to illustrate the purpose of designing FSI to describe the corresponding error handling mechanism, including the processing mode for the following processes:
1. Various methods of debugging CSI capture errors on Xavier-based platforms.
2. How to determine which method to use for debugging.
3. How to identify errors.
4. Possible root causes of the error.
The layers where errors may occur during camera capture are as follows:
Whenever errors are encountered while decoding CSI packets received at the SoC CSI interface and writing the raw frame data to memory,
the VI hardware engine notifies RCE of these errors.
The capture stack running on CCPLEX can query the capture status from the RCE and display:
ID of the CSI stream, ID of the VC where the error occurred, error type, detailed error for each error type.
Error status provides a good starting point for determining the root cause and determining next steps.
If the VI engine does not successfully capture the frame or encounter an error and report the error to RCE, a frame start or end timeout error message is displayed.
This may be due to one of two reasons:
1. The deserializer is not streaming data.
2. The VI channel is not configured to capture the correct data type/VC id.
How NVIDIA series chips consider information security
For Nvidia chips, effective information security is mainly ensured through two levels: one is the information security chip core module Tegra. Although this chip also had a security vulnerability in 2018. An extreme hacker exploited a vulnerability in the NVIDIA Tegra X1 chip to crack the Switch console. This vulnerability allows anyone to run arbitrary code on it, which means that home-made systems and pirated software can run at will. However, through continuous internal optimization, Nvidia has largely completed the vulnerability patching.
For example, on the original Nintendo Switch console using the same Tegra X1 CPU, a vulnerability was discovered in the ROM bootloader and fixed via recovery mode and a buffer overflow. NVIDIA can address this type of fix using built-in programmable fuses to store patches into internal ROM. This takes into account both weak links and security, and reduces code bugs.
As another example, the nature of the hardware design means that certain internal hardware modules are inaccessible to CCPLEX and only BPMP can manipulate them. All low-level boot steps, including u-boot, can be secured through signed binaries. Their keys can be stored in one-time programmable fuses in the CPU. U-boot itself can be configured to use signed FIT images, thus providing a secure boot chain all the way to the Linux kernel. Both the original ROM bootloader and TegraBoot also support fully redundant boot paths.
Tegra is a unique information security chip core in NVIDIA chips. Tegra Security Controller (TESC) is an information security subsystem. It has its own trusted root ROM, IMEM, DMEM, and Crypto accelerator (AES, SHA, RNG, PKA) , critical links and critical storage. TSEC provides an on-chip TEE (Trusted Execution Environment) that can run NVIDIA-tagged processing code. TSEC is a typical secure video playback solution that downloads HDCP 1.x and 2.x connection authorization and complete line-end connection detection required for secure operation.
Overall, TSEC can support:
1) HDCP 1.4 on line-end HDMI 1.4 and HDCP 2.0 2.1 on line-end HDMI 2.3;
HDCP connection management does not expose protected content and requires no software keys running on the CPU. Two software-programmable independent command queues for HDCP link management (accommodating up to 16 commands); the entire chip is capable of disabling HDMI output independently of the player if an HDCP status check fails.
2) Platform security controller
;
It is a high-security subsystem that can protect and manage assets (keys, fuses, functions, features) in the SOC, provide trusted services, improve freedom of defense against attacks on the SOC, and improve the security of the subsystem itself. level of protection against software and hardware attacks.
3) Key management and protection
;
The PSC will be the only mechanism with access to the most critical keys in the chip. This subsystem represents the highest level of protection in Orin-x, and the subsystem itself is highly resilient to a variety of software and hardware attacks.
4) Credit services
;
For example, during SOC secure boot, the primary PSC service can complete effective secure authentication, provision of additional keys/IDs/data, key access and management, random number generation, and credit timing reporting.
5) Information security monitoring.
The PSC will be responsible for regular security management tasks, including continuously assessing the security posture of the SOC, proactively monitoring for known or potential attack patterns (e.g., voltage failures or thermal attacks), mitigating the risk of hardware attacks, and monitoring the security status of the SOC if an attack is detected. Take effective measures. PSC will be able to accept various software updates as workarounds to improve the robustness of on-site systems.
The second is the application of security engine (SE). It can provide hardware acceleration for encryption algorithms.
There are two situations in Security Engine SE where software usage is useful. First, TZ-SE can only be accessed by trusted zone software. Second, NS/TZ-SE can be configured to be accessed by trusted software zones or non-secure software. Security Engine SE can provide hardware acceleration and hardware-supported key protection for various encryption algorithms. The encryption algorithms provided by SE can be used by software to establish encryption protocols and security features. All encryption operations are based on encryption algorithms approved by the International Standards and Technology Institute NIST.
NVIDIA’s Security Engine SE can support all information security capabilities including the following:
NIST-compliant symmetric and asymmetric encryption and hashing algorithms, side-channel countermeasures (AES/RSA/ECC), independent parallel channels, Hardware Key Access Control (KAC) (rules-based symmetric key for enhanced hardware access control), 16xAES , 4xRSA/ECC keyhole, hardware key isolation (only for AES keyhole), read protection (only for AES keyhole), hardware keyhole function, key wrapping/unpacking function (AES->AES keyhole), The key is separated from the keyhole (KDF->AES keyhole), and the random key is generated (RNG->AES keyhole).
Summarize
This article comprehensively analyzes the main features and strategic advantages of the entire NVIDIA chip in its application process from the perspectives of core architecture, functional security, and information security of the entire NVIDIA chip. Regarding how to use NVIDIA series chips for development, fully considering its internal architecture and combining its functional safety and information security capabilities are particularly important for the entire development and tuning. Subsequent articles will conduct detailed strategic analysis from the perspectives of hardware development and software development.