Nvidia Orin Price Estimates and In-depth Analysis-EEWORLD

Collect

Qualcomm 8295 is the chip used in General Motors' Ultra Cruise in 2023. It is basically the automotive version of Qualcomm Snapdragon 888. The initial price of Snapdragon 888 was about US$240, and it is currently about US$170 (Qualcomm's public information can be used to check its MSM chip shipments and revenue, with an average price of about US$30-35). Because most of the costs have been covered by mobile phones with shipments of 20 to 30 million, the price of SA8295 can be very low.

However, Qualcomm's automotive chips are generally outsourced to TSMC, which has much higher prices than Samsung (TSMC's operating profit margin is almost 4 times that of Samsung's wafer foundry business). It is estimated that the price of SA8295 is about US$150. If it is Samsung's 5nm, the estimated price is US$120 or US$100. However, Qualcomm wants to add an AI accelerator, but the price of the AI accelerator is not expected to exceed US$50. Taken together, Qualcomm still has a price advantage.

Orin's shipment volume is naturally not comparable to Snapdragon 888, but Samsung's mature technology, plus the fact that there are multiple versions such as cockpit version and game console version to share the cost, the price is estimated to be US$320. However, this unit price is not very meaningful. At present, L3/L4 smart driving vehicles are expensive, and the technology iteration is very fast. The product life cycle is getting shorter and shorter, and the shipment volume throughout the life cycle is negligible. The development cost spread over each vehicle far exceeds the ECU hardware cost. Manufacturers consider the overall cost, especially the software cost and one-time fees, and should not care about the unit price of SoC. Chip manufacturers also promote a full set of solutions, including both hardware and software.

On November 9, 2021, NVIDIA officially launched the module using Orin, namely Jeston AGX Orin, which means that individual users can also buy the top computing module in the field of autonomous driving. The price of Jeston AGX Orin was $1,099 (now the price has dropped to $699, and the domestic price is about 6,000 yuan including tax). The price of Jeston AGX Orin will not be too high, estimated to be $1,499-1,799, and the price is estimated to be $1,299 in three years.

Image source: Internet

The module also includes 32GB of LPDDR5, with a bandwidth of 204.8GB/s and a price of about $105. The price of LPDDR5 has risen recently, and even Apple 13 uses LPDDR4 to save costs. The 64GB eMMC is very cheap, currently priced at $7. Other key chips include a QSPI NOR and a Secure NOR, both of which are not expensive, estimated at $5-8. There is also a power system.

Image source: Internet

The internal framework diagram of Orin can be simply divided into five parts: storage, peripherals, CPU, GPU and accelerator.

Image source: Internet

Orin functional framework diagram

Image source: Internet

Part of the framework of the Orin CPU. The A78 here should be the A78AE (Automotive Enhanced), which is the A78 for the automotive field. ARM recommends that the A78 use a 5nm process and run at a frequency between 2.1GHz and 2.8GHz. Considering the automotive regulations, Nvidia has set the upper limit of the operating frequency to 2GHz. For cost considerations, it did not use the 5nm process, but used Samsung's 8nm process, which is similar to TSMC's 10nm process.

NVIDIA abandoned its self-developed big and small core architecture and switched to ARM's cluster architecture, which is the DSU, DynamIQ Shared Unit (DSU) control unit proposed by ARM in 2017. It allows up to 8 CPU cores to form a cluster, and a single processor can achieve up to 32 clusters. In this way, a processor can have up to 256 cores and can be expanded to 1,000 cores through the CCIX bus.

NVIDIA has not released the CPU framework diagram of Xavier. It should be a cluster of 4 cores, with two clusters. NVIDIA Xavier's cache is still described in detail.

From the cache point of view, Orin seems to be more concerned about the cost. The caches of L2 and L1 are relatively small, but the cache of L3 is not small.

Image source: Internet

The internal framework of A78AE seems to be for memory protection and lockstep, so the L1 cache capacity is not high. DSU can allocate caches at all levels and is also responsible for controlling the switch of each CPU core in the cluster, the frequency, and the voltage. It is the key to controlling CPU performance and power consumption. Therefore, the DSU part has logical control redundancy. This is the main difference from the consumer A78, that is, the addition of DSU-AE.

Image source: Internet

In partition mode, DSU controls each cluster to run at full power. In lock-step mode, one core in each cluster is in sleep mode. Once an abnormality is detected, the backup system is activated.

Image source: Internet

In terms of GPU, each stream processor SM contains 128 CUDA cores, with a total of 16 SMs, a total of 2048 CUDA, and a computing power of 4096 GFLOPS. There are also 64 tensor cores, with a computing power of 131TOPS under sparse INT8 model, or 54TOPS under dense INT8.

Image source: Internet

The 64 tensor cores use half-precision matrix multiplication and accumulation and integrated multiplication and accumulation instruction sets, HMMA (Half-Precision Matrix Multiply and Accumulate) and IMMA (Integer Matrix Multiple and Accumulate), so that the GPU architecture can also correspond to dense algebraic operations and deep learning reasoning. NVIDIA uses a fine transformation weight system to convert dense training weights into a sparse weight model. The sparse constraint is that for every 4 weights, two cannot be zero. After this transformation, the access space of the weights is greatly reduced, and the tensor processing can also skip zero values, which doubles the speed.

Image source: Internet

The internal framework of NVIDIA's deep learning accelerator. NVIDIA's deep learning accelerator is aimed at inference applications. Perhaps because it is considered to have no technical content, NVIDIA introduced DLA very simply. In just a few words, the GPU, CPU, and PVA are introduced in detail. Indeed, the deep learning accelerator has no technical content, it is just a stack of multiplication and accumulation units. The improvement is the addition of 608KB of buffer, which should actually be the addition of 608KB of SRAM, which improves the operating efficiency and small models do not need to read DRAM frequently. The performance of this DLA is 97TOPs for INT8 sparse model and 194TOPs for two. The previous generation of Xavier is 11.4TOPs, but it is a dense model.

Image source: Internet

The PROGRAMMABLE VISION ACCELERATOR programmable vision accelerator, or PVA, is shown in the figure above. Compared with Xavier's first-generation PVA, 1MB of L2 is added, and the rest remains almost unchanged. PVA is mainly used for vector operations such as filtering, distortion, graphic triangle generation, feature detection, and FFT. The specific applications are mainly stereo binocular, feature detectors, feature tracking, and target tracking. It contains two 7-slot (two scalar, two vector, and three storage) VLIW vector processors, two DMA engines, and a real-time Cortex-R5.

Image source: Internet

The typical application of PVA is stereo binocular disparity pipeline. It is particularly important to point out that NVIDIA has been promoting VPI. Vision Programming Interface (VPI) is NVIDIA's high-performance computer vision/image processing algorithm library interface. VPI provides a unified interface for various hardware, such as CPU, GPU, Programmable Vision Accelerator (PVA), and Video Image Compositor (VIC), and provides convenient GPU parallel functions. Supported algorithms include Gaussian pyramid generator, Laplacian pyramid, separable image compressor, box image filter, Gaussian image filter, bilateral image filter, image rescaling, image remapping, image histogram, histogram equalization, fast Fourier transform, inverse fast Fourier transform, image format converter, perspective warping, background subtraction, lens distortion correction, temporal noise reduction, pyramid LK optical flow, and its own commonly used algorithms. NVIDIA VPI seems to intend to replace OpenCV. On NVIDIA's computing platform, VPI is significantly faster than OpenCV.

Some mobile terminals, such as Separable Convolution, have increased efficiency by 29 times. NVIDIA has monopolized deep learning with CUDA, and its next goal is to monopolize computer vision algorithms with VPI.

Image source: Internet

In terms of interfaces, it provides up to 6 CSI camera interfaces, which may not seem like much, but it can be increased to 16 through virtual channels. Generally, autonomous driving uses dual Orin, and 16 MIPI CSI channels are 4 8-megapixel cameras, while dual Orin is 8 8-megapixel cameras.

Image source: Internet

The interface basically corresponds to the architecture in the above picture, with 16 4-megapixel cameras, 8 lidars, and 1 1G Ethernet. Two 10G Ethernets are connected to the backbone network and switches. Compared with Xavier, Orin's AI computing power mainly comes from DLA, while Xavier is GPU. Judging from the simple bare crystal picture, the next generation of Atlan should return to the Xavier route. The AI computing power mainly comes from the GPU. The area of the GPU is much larger than that of the DLA. Because a DPU module has been added, the area of the DLA has been greatly compressed. The code name of the next-generation GPU architecture may be Ada Lovelace. Ada Lovelace is the first programmer in human history, the daughter of the famous British poet Byron, and a mathematician.

Orin does not seem to be very complete, especially the CPU. With a series of new technologies from ARM after A78, Apple, Samsung, Intel and even MediaTek are capable of challenging Orin. The problem is that the L3/L4 smart car market is too small compared to mobile phones and PCs, and to provide a full set of solutions, latecomers spend a lot on software, which makes Orin almost monopolize the market. If domestic chips want to challenge Orin, they must purchase ARM's most advanced architecture and use at least 5 nanometers of advanced technology, which results in a one-time cost of at least $100 million. The overall development cost of the chip is expected to be more than $200 million. Even if the shipment volume is 100,000 vehicles throughout the life cycle, the cost of a single SoC is $2,000. Obviously, this price is unacceptable to car manufacturers. It is completely impossible for any company to challenge Orin in the automotive market alone.

[1] [2]

Reference address：Nvidia Orin Price Estimates and In-depth Analysis

Previous article：IAR Systems supports NXP S32K3 MCU family for next-generation automotive applications
Next article：In-depth analysis of TI's second-generation radar chip

Recommended ReadingLatest update time:2024-11-16 14:55

NVIDIA RTX 2000 Ada Generation GPU brings the performance and versatility needed for next-generation AI-accelerated design and visualization

The latest RTX technology provides designers, developers, engineers and embedded and edge applications with a cost-effective combination of technologies. Generative AI is driving changes in all walks of life, and in order to take full advantage of its advantages, companies must choose the right hardware to empow

[Embedded]

NVIDIA RTX 2000 Ada Generation GPU brings the performance and versatility needed for next-generation AI-accelerated design and visualization

There is a constant controversy over the price increase of graphics cards. Is Nvidia the "culprit"?

For PC DIY enthusiasts and gamers, graphics card prices have finally dropped as everyone expected after being high for months, and they have dropped by nearly 50% in just two months. The RTX 3060, which was once priced at over 7,000, quickly dropped to around 3,500 (for the locked computing power version), and graphic

[Embedded]

There is a constant controversy over the price increase of graphics cards. Is Nvidia the

Nvidia and AMD's "super urgent orders" are only increasing, and news says that TSMC CoWoS will double its production capacity

According to news on January 24, Digitimes quoted semiconductor equipment manufacturers as saying: As "super urgent orders" from customers such as Nvidia and AMD have only increased, advanced packaging expansion plans such as TSMC's CoWoS have accelerated and production capacity targets have been raised. It is said th

[Semiconductor design/manufacturing]

Is Nvidia's sky-high chip acquisition under attack from everyone on the verge of collapse?

Nvidia, the US chip giant, faces many difficulties in acquiring ARM, a British chip design company. While the review by various governments is still pending, this acquisition worth $54 billion has been opposed by many industry leaders. The reasons for the opposition are obvious: the combination of two giants that cont

[Semiconductor design/manufacturing]

Detailed explanation of NVIDIA chips in the software transplantation design and development of autonomous driving

As a universal SOC chip, the NIVIDIA DRIVE Orin series can be used for a variety of perception and general computing tasks. Its high-quality computing power, operating performance, complete compatibility, and rich I/O interfaces can reduce the complexity of system development. These features make the Orin series of

[Embedded]

Detailed explanation of NVIDIA chips in the software transplantation design and development of autonomous driving

Nvidia reaches agreement with Mercedes-Benz to provide it with autonomous driving chips and software platform

On June 24, Nvidia reached an agreement with Mercedes -Benz , a subsidiary of Germany's Daimler . The company will provide chips and software platforms that can eventually be used for autonomous driving functions for Mercedes-Benz cars produced from 2024 . It is understood that the two companies will develop autono

[Automotive Electronics]

Nvidia reaches agreement with Mercedes-Benz to provide it with autonomous driving chips and software platform

Intel enters the automotive industry with a high profile and officially declares war with Qualcomm and Nvidia

"Intel is bringing AI PC to cars". On January 9, during the International Consumer Electronics Show (CES), Intel officially announced its entry into the automotive market, focusing on three major directions: smart cockpit chips , tram energy AI management, and open automotive chip customization platforms. Among the

[Automotive Electronics]

Nvidia, the "stock king", invested over 1.4 billion in an autonomous driving company!

With 2 billion yuan in financing in 3 years, Zhika has created another unicorn The winter of autonomous driving has passed, but the thunder has suddenly come... On June 18th local time, Canadian self-driving truck startup Waabi released its latest announcement, officially announcing that it had raised

[robot]

Popular Resources
Popular amplifiers