Typical architecture design scheme for autonomous driving based on NVIDIA chips

Publisher:qinghongLatest update time:2023-05-10 Source: elecfans Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

NVIDIA DRIVE AGX is a scalable, open autonomous vehicle computing platform that serves as the brain of autonomous vehicles. As the leading hardware platform in its class, NVIDIA DRIVE AGX provides high-performance, energy-efficient computing for functionally safe AI autonomous driving. In terms of hardware, the NVIDIA DRIVE embedded supercomputing platform processes data from cameras, general radar, and lidar sensors to perceive the surrounding environment, determine the location of the car on the map, and then plan and execute a safe driving route. In terms of software, NVIDIA DRIVE AGX is scalable and software-defined, and the platform can provide advanced performance to help autonomous vehicles process large amounts of sensor data and make real-time driving decisions. The open NVIDIA DRIVE software stack can also help developers use redundant and diverse deep neural networks (DNNs) to build perception, mapping, planning, and driver monitoring functions. Through continuous iteration and wireless updates, the platform becomes increasingly powerful. At the same time, the open NVIDIA DRIVE SDK provides developers with all the building blocks and algorithm stacks required for autonomous driving. The software helps developers build and deploy a variety of advanced autonomous driving applications more efficiently, including perception, positioning and mapping, planning and control, driver monitoring, and natural language processing. This article will be divided into several chapters, taking the most widely used mainstream NVIDIA chip Orin x as an example, to explain how to develop and apply from the software to the hardware level from both the hardware and software perspectives.


1. NVIDIA internal architecture design

Taking Orin-x as an example, the CPU includes a main CPU complex based on Arm Cortex-A78AE, which provides general-purpose high-speed computing capabilities; and a functional safety island (FSI) based on Arm Cortex-R52, which provides isolated on-chip computing resources and reduces the need for external ASIL D functional safety CPU processing.


The GPU is the NVIDIA Ampere GPU, which provides advanced parallel processing computing capabilities for the CUDA language and supports a variety of tools, such as TensorRT, a deep learning inference optimizer and runtime that provides low latency and high throughput. Ampere also provides state-of-the-art graphics capabilities, including real-time ray tracing. Domain-specific hardware accelerators (DSAs) are a set of dedicated hardware engines designed to offload various computing tasks from the computing engine and perform these tasks with high throughput and high energy efficiency.

The following diagram shows the high-level architecture of the SoC, divided into three main processing complexes: CPU, GPU, and hardware accelerators.

08eb9afc-ce98-11ed-bfe3-dac502259ad0.png

The internal architecture design of the entire chip is mainly divided into functional designs by blocks, including the underlying operating system software QNX BSP (clock source & system restart, CAN/SPI/I2C/GPIO/UART controller, configuration register, system configuration), real-time operating system QNX RTOS, Nv multimedia processing module (sensor processing module MCU (R5), PVA, DLA, Audio Processor, MCU R5 configuration real-time camera input), classic Autosar processing module (for Safety Island Lock-Step R52s), safety service Safety Service (ARM Cotex-A78AE CPU Complex, CPU Switch fabric Coherent, information security PSC), neural network processing module (CUDA & TensorRT).


2. Typical architecture design of autonomous driving based on NVIDIA chips

Conventional SOC system architecture is usually designed with a conventional SOC+MCU dual-chip or even a triple-chip design. Due to its advantages in computing performance, SOC is generally better than MCU in computing application scenarios in front-end perception and planning.


MCU can be used as a verification output for control execution because of its high functional safety level. The industry has always had mixed opinions on whether NVIDIA chips can simply be used as super-heterogeneous chips like TDA4 to independently undertake tasks. In principle, whether it is the Xavier or Orin series, NVIDIA series chip designs have rich AI and CPU computing capabilities. Considering the development of autonomous driving systems above the L2+ level, this capability can fully adapt to the entire solution design.


So, is the industry promoting the corresponding design solution? The answer is no.

In the latest safety requirements in NVIDIA's datasheet, the recommended architecture design for the Orin series chips still requires the use of a specific MCU for failure analysis and risk assessment, so that serious system failures can be located in a timely manner, thereby ensuring that the autonomous driving safety integrity capability requirements defined by the ISO26262 standard are met (this will be explained separately later). At the same time, considering the power management of the entire domain control, connecting an external MCU can also greatly improve its power management capabilities, including entering and exiting sleep mode, etc.


The MCU set up as above can also be called a Safe MCU (SMCU) to a certain extent. In the process of developing the system, some MCUs with higher safety levels (generally need to reach ASIL D level) need to be used, such as Infineon Aurix TC series and Renesas RH850 series, which can act as MCUs to access Orin's SMCU. Such an SMCU can actually act as the power control and serious failure avoidance of the entire system development.


As shown in the figure above, the three-layer fail-safe framework based on the NVIDIA chip design system architecture is shown. Generally speaking, the architecture implements three-level fail-safe protection from the basic services of the SOC layer, the operating system, the virtual machine, the implementation operating environment, and the real-time operating environment of the MCU. The SOC layer and the MCU layer perform healthy and independent watchdog monitoring at the NvIVC, NvIPC, and SPI/Error Pin levels, respectively. Among them, the SOC itself will carry a part of the lockstep safety check Lockstep FSI, and run the virtual machine Hypervisor on the core CPU complex CCPLEX (Carmel CPU complex running the capture stack and applications. Indicates the Carmel CPU complex running the capture stack and applications). The CPU core uses the QNX operating system with a high functional safety level to complete the resource scheduling of the application software watchdog, middleware, application layer software, and driver software. Of course, for the real-time operating system, it still runs on the standard Autosar.


The security architecture shown in the figure below shows how the external MCU supports the boot data flow on the SOC and performs effective secure boot through a standard error reporting/propagation data flow. The entire program and data boot loading process includes three levels: Boot L1 CCPLEX, Boot L2 FSI, and Boot L3 External MCU.

0959aa2e-ce98-11ed-bfe3-dac502259ad0.png

Startup link design

During the L1 program startup process, the bottom-level startup includes using the "Boot and Power Management Processor (BPMP)" terminal (a small ARM core located at the core of the system) to load the bottom-level startup program to the BPMP server, and the virtual machine Hypervisor or operating system Safety OS calls the corresponding startup program file. In general, the Cortex-R5 of BPMP can achieve:

1. Lock-step core pairing

2、Arm 7-R ONE

3. Vector interrupt support: Based on daisy-chain Arm PL192 vector interrupt controller (AVIC)

4. TCM interface for local SRAM

5. Complete instruction and data cache (including 32KB instruction cache I-Cache and 32KB data cache D-Cache)

6. Arm processor revision

At the same time, the underlying iGPU core will also be driven by the RM integrated server. Finally, the first-layer loading boot program L1 CCPLEX (called CPU Complex in NVIDIA, which is a high-performance 64-bit Arm core) completes various professional tasks such as operating system task scheduling, boot management program loading, and RM server driving GPU core.


In addition, Level 2 also mainly involves the functional safety island verification FSI mentioned in the previous article. This article will explain it separately later.


Finally, the external SMCU can provide an additional layer of security protection and boot management configuration, so that the entire chip can be fully driven from a security perspective.


3. Functional Safety Island Design Principles

Figure 2 shows how to load FSI and underlying related module driver boot programs in NVIDIA series chips. In terms of functional safety design of NVIDIA series chips, the Orin series achieves ASIL D system capability design and ASIL B/D random error management capability design by setting goals. This includes ASIL decomposition requirements based on SOC chip hardware to each core, ensuring that the consistency of inter-core design can meet ASIL D requirements, and applying the standard ASIL D development process to the entire functional safety design, and performing corresponding safety designs for safety processes, Drive AGX, operating system Drive OS, Drive Work, sensors, redundant architecture design, and safety strategies from the bottom up.

poYBAGQk9DyAEJFpAAM_wGHgPME490.png

The functional safety island (FSI) of the NVIDIA series of chips is a processor cluster containing Cortex-R52 and Cortex-R5F real, and a time processor core with a dedicated I/O controller. For example, the FSI module in Orin-X has its own voltage rail, oscillator, PLL, and SRAM to ensure minimal interaction with other modules inside the SOC and to achieve no interference between the above modules.


Orin-x Series FSI features include:

The Cortex-R52 processor, also known as the safety CPU, has 4 cores (8 physical cores in total) in DCLS (dual-core lockstep) mode, can run the classic AUTOSAR operating system, implement error handling, system fault handling and other customer workloads, and has a comprehensive performance of approximately 10KDMIPs.

[1] [2]
Reference address:Typical architecture design scheme for autonomous driving based on NVIDIA chips

Previous article:What are the core technologies of new energy electric drive?
Next article:Analysis of fast charging and slow charging interface schematics for new energy vehicles

Latest Embedded Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号