From NVIDIA's autonomous driving chip Thor, we can see the development trend of big chips-EEWORLD

Collect

At the NVIDIA GTC 2022 Fall Conference in the early morning of September 21, Beijing time, CEO Huang Renxun announced the autonomous driving chip that will be launched in 2024. Because its 2000TFLOPS performance is too powerful, NVIDIA simply named it Thor, replacing the previous 1000TOPS Altan.

The release of Thor represents the shift from distributed ECU and DCU to a fully centralized single chip with integrated functions in the automotive field. It also indicates a cruel reality: "Many ADAS chip companies that make DCU-level products are already lagging behind while their products are still being designed."

Data centers for cloud and edge computing, as well as super terminal fields such as autonomous driving, are typical complex computing scenarios, and the computing platforms for these scenarios are typical high-computing power chips.

The development trend of large chips has become increasingly obvious from the separation of GPU and DSA to the reintegration of DPU and super terminals. In the future, it will be further integrated into a super-heterogeneous computing macro system chip (Macro-SOC).

1 NVIDIA's autonomous driving chip Thor

1.1 Development Trend of Autonomous Driving Car Chips

The above figure is a schematic diagram of the evolution of automotive electrical architecture given by BOSCH. From module-level ECU to domain controller that concentrates related functions, and then to fully centralized vehicle computers. Each stage is also divided into two sub-stages. For example, the fully centralized vehicle computer also includes two methods: local computing and cloud collaboration.

There is no schematic diagram of NVIDIA Altan's chip architecture yet (Thor has just been released, and I haven't found a similar diagram), but we can see that Altan&Thor's design approach is completely "endgame thinking," which is one step further than the step-by-step evolution given by BOSCH, spanning centralized on-board computers and cloud-coordinated on-board computers to cloud-integrated on-board computers. Cloud-integration means that services can run dynamically and adaptively in the cloud or on the end, facilitating dynamic adjustment of cloud resources. Altan&Thor uses a computing architecture that is completely consistent with the cloud: Grace-next CPU, Ampere-next GPU, and Bluefield DPU, which can achieve cloud-integration in hardware.

1.2 Comparison of Intel Mobileye, Qualcomm and NVIDIA chip computing power

We can see that the EyeQ Ultra chip with the highest computing power for L4/L5 that Mobileye plans to release in 2023 has only 176 TOPS.

From the above picture we can see that Qualcomm's planned L4/L5 autonomous driving chip is 700+TOPS, and it consists of four chips including two APs and two dedicated accelerators.

Comparing with NVIDIA Altan, the previously planned L4/L5 autonomous driving chip Altan has a computing power of 1000TOPS.

NVIDIA's king bomb! It overturned the previous Altan and gave it a new name, Thor (Thor), with a computing power of an astonishing 2000TOPS.

After the release of NVIDIA Thor, Qualcomm "quickly" released its own 4-chip 2000TOPS computing power solution.

1.3 Single chip realizes multi-domain computing, which usually requires more than 5 chips

NVIDIA Thor provides 2000TFLOPS of computing power (compared to 2000TOPS provided by Atlan).

The Thor SoC is capable of multi-domain computing, which can divide tasks for autonomous driving and in-vehicle entertainment. Typically, these various types of functions are controlled by dozens of control units distributed throughout the vehicle. Instead of relying on these distributed ECUs/DCUs, manufacturers can use Thor to integrate all functions to integrate the entire vehicle.

This multi-computing domain isolation allows concurrent time-sensitive processes to run uninterrupted. Through the virtualization mechanism, Linux, QNX, Android, etc. can be run simultaneously on a single computer.

2 The essential difference between autonomous driving SOC and mobile phone SOC

Here we give a concept: complex computing. Complex computing means that on top of the traditional AP/OS system, it is also necessary to support virtualization and service-oriented, and realize the coexistence of multiple systems on a single device and the collaboration of multiple systems across devices. Therefore, if the AP-level system is regarded as a system, then complex computing is a macro system composed of many systems.

After the operating system is deployed on traditional APs such as mobile phones, tablets, and personal computers, we run various application software on them. The entire system is a whole, and each specific process/thread may have performance interference issues.

However, on a platform that supports full hardware virtualization (including full hardware virtualization of CPU, memory, I/O, various accelerators, etc.), it is not only necessary to divide the macro system into multiple independent systems, but also to achieve physical isolation between the systems in terms of applications, data, performance, etc.

Self-driving cars usually need to support five main functional domains, including power domain, body domain, self-driving domain, chassis domain, and infotainment domain. Therefore, the centralized self-driving car super terminal chip must achieve complete hardware virtualization and must support complete isolation of each functional domain (no interference with each other).

We call this type of virtualization and multi-system computing scenarios complex computing, and chips that support complex computing can be considered "big" chips. This type of scenario currently mainly includes: cloud computing, supercomputing, edge computing, data centers of 5G/6G core networks, and super terminals for scenarios such as autonomous driving and the metaverse.

3. Customizing ASIC/SOC is meaningless in the face of absolute computing power advantage

With the development of cloud computing, the continuous collaboration and even integration of cloud, network, edge and end, and the increasing scale of systems, the development path of ASIC and traditional ASIC-based SOC is increasingly heading towards a "dead end". The simpler the system, the fewer changes; the more complex the system, the more changes. Complex macro systems must be rapidly iterated, and different users have many differences. The traditional ASIC approach is bound to encounter great difficulties in complex computing scenarios.

In the field of autonomous driving, without using an acceleration engine, traditional SOCs can achieve AI computing power of about 10 TOPS; many companies have quickly improved computing power by customizing acceleration engines, which can increase AI computing power to 100 or even 200 TOPS. However, there are many problems with the implementation of traditional SOCs:

The intelligent algorithms of autonomous driving and various upper-layer applications have been evolving and upgrading rapidly. The life cycle of customized ASICs will be very short, because the functions are fixed and it is difficult for vehicles to update more advanced system upgrade packages. This makes ASIC unable to well support functional upgrades throughout the life cycle of the vehicle.

The entire industry is evolving rapidly. If it develops to the L4/L5 stage in the future, all current work will be meaningless: including chip architecture, customized ASIC engines, and the entire software stack and framework based on this, all will need to be started from scratch.

I am increasingly realizing that making custom ASICs on large chips is a nightmare; the reality is that we need to decouple software and hardware to some extent to realize general-purpose chips. Only after software and hardware are decoupled can hardware personnel let go and work hard to increase computing power, and software personnel can focus more on their own algorithm optimization and business innovation without having to worry about the details of the underlying hardware.

Under the same resource cost, general-purpose chips have a certain degree of performance loss in order to achieve universality. Therefore, innovation is also needed to make general-purpose large chips:

An innovative architecture is needed to achieve sufficient versatility while achieving the ultimate performance and an order of magnitude improvement in performance.
The architecture needs to be forward compatible and support platform-based and ecological design;
We need to take a more macro perspective and achieve the unification of cloud-network-edge-end architecture in order to better build cloud-network-edge-end integration and fully utilize resources such as computing power.

In the face of absolute computing power advantage, all customized chip solutions are meaningless.

4 Development Trends of Chips: From Separation to Fusion

Computer architecture is shifting from the separation of GPU and DSA to integration:

In the first stage, the CPU is a single general computing platform;
The second stage is from integration to separation, the heterogeneous computing platform of CPU+GPU/DSA;
The third stage is the starting point from division to integration, with a heterogeneous computing platform centered on DPU;
The fourth stage is from division to integration, where numerous heterogeneous structures are integrated and reconstructed into a more efficient super-heterogeneous fusion computing platform.

The field of autonomous driving already has independent single chips with integrated functions like Thor. In edge computing and cloud computing scenarios, will independent single chips be far away?

In lightweight scenarios such as edge computing, independent single chips with integrated functions can be used for coverage; in heavyweight scenarios such as cloud computing service hosts, single chips with integrated functions can be implemented through chiplets.

5 Development trends of big chips in various fields

To put it simply, the development trend of large chips is: single-chip MSoC with integrated functions and super heterogeneous computing architecture.