In 2024, automotive chips will usher in a new decade
[Introduction] Vehicle-mounted smart chips - no matter how it is defined, will change the chip industry in the next decade just like mobile phone smart chips in this decade. When we design the next generation of automotive smart chips, can general-purpose automotive chips meet the ever-evolving autonomous driving algorithms? How to examine the two almost contradictory indicators of efficiency and robustness? We hope this article will inspire readers.
In 2016, I just came to the United States to study and bought my first car. This is a black Nissan Altima built in 2012. There is not a single display screen on the entire car, and there are no auxiliary driving functions such as car following and lane keeping. I can’t even tell which chips on the car are the result of the semiconductor industry in the past thirty years, and the computing power of all these car chips has increased. Together, it was far inferior to a flagship smartphone at the time.
I guess even people in the automotive industry at the time would have found it hard to imagine that three years later in 2019, Tesla would use a 14nm process with a 12-core ARM A72 CPU, ARM Mali G71 GPU, and 2 neural Network accelerators, as well as a large number of different hardware acceleration units, with a total computing power of more than 600 GFlops, are packed into every Tesla that comes off the production line.
Running on such a server full of computing power is the "fully autonomous driving" system designed by Tesla. No matter how controversial there is, Tesla and a large number of other car companies are leading us on the road to autonomous driving without looking back.
Today, three years later, a car chip with a computing power of 600 GFlops can no longer be called an industry leader. Tesla’s latest car chip has reached 72 TFlops. The self-driving algorithm has also received a lot of updates, with Transformer replacing the traditional CNN ( convolutional neural network ). More sensors, including lidar, multi-input cameras, etc. are widely used. The rapid development of the industry has made the most advanced autonomous driving system three years ago, whether it is hardware or software, like a BlackBerry facing the new iPhone, being left behind by the tide of the times.
But the industry as a whole is still improving. The tasks of autonomous driving have become more complex, evolving from basic L2-level assisted driving to the more ambitious L4 and L5 levels of fully autonomous driving. What follows is more complex autonomous driving algorithms. Algorithms drive hardware evolution. Each of our researchers is not so much standing at "The beginning of the end" of the grand subject of autonomous driving as it is that we have just reached it. "The end of the beginning".
Universality and specificity: the life of an operator
Smart vehicle chips, like smartphone chips, achieve a unity between versatility and specificity. At the architectural level, a multi-core general-purpose chip and multiple specialized hardware constitute a system-on-chip. Special hardware such as neural network accelerators provides a large amount of computing power to serve modules such as visual recognition systems in autonomous driving systems.
However, dedicated hardware still cannot escape the issue of versatility. Especially for hardware designers, in this era where algorithm evolution is much faster than hardware progress, how can hardware design serve different software algorithms? Neural network accelerators provide a good example: using one or more general operators to implement different networks for the same task, or even using the same hardware for different tasks.
For complex autonomous driving software, the perception module with deep learning as the core is only one part of it. The remaining modules such as positioning, planning, and control do not yet have a clear operator for hardware designers to use. Although there are research works [1] trying to find a common computing model among multiple positioning algorithms, the vast majority of work is still simply based on one or more algorithms when designing dedicated positioning or path planning acceleration units for smart vehicle chips. to customize specialized hardware. When the autonomous driving software is updated to the next version, it is likely that this dedicated hardware will no longer be used.
For the selection of general operators, hardware designers need to consider the following two dimensions:
First, horizontally adapt different algorithms. Different service providers are likely to use completely different algorithms for the same module in their autonomous driving software. Hardware designers can design hardware for different service providers by using operators that are common across different algorithms.
Second, the continuous evolution of vertical adaptation algorithms. The speed of algorithm iteration far exceeds the speed of hardware iteration, and it is difficult to update in-vehicle smart chips after they are installed in the car. Therefore, designing hardware platforms for multiple generations of algorithms is also something we need to consider.
The operator itself can be one or more simple operations, or it can be a more complex intermediate medium. Among them, factor graph [2] is gradually gaining people's attention as a general medium.
As a type of probability diagram, the factor diagram itself represents the product of a series of probability distributions. There are two types of nodes in a factor graph, namely factor nodes and variable nodes. In a factor graph, all nodes are connected by directed edges. A large number of algorithm modules in autonomous driving software for optimization purposes, including positioning algorithms represented by SLAM [3] , tracking algorithms [4] and control algorithms [5] in the perception module , etc., can all be represented as factor graphs form and be solved. For example, the figure below is a simple action planning algorithm represented by a factor graph. The left figure includes five different states, the right figure shows the matrix A being solved, and the dotted lines represent the factors and elements in the matrix. correspondence.
A simple action planning algorithm represented by factor graph
Of course, the construction and solution processes of factor graphs may be different in different algorithms, but this does not prevent factor graphs from being mined as a potential universal representation. Based on this, we used the factor graph as an intermediate medium and designed a special hardware for the SLAM positioning algorithm [6] . Then we used the factor graph for path planning algorithms and control algorithms, and tried to design a general purpose acceleration algorithm using the factor graph. Hardware to accelerate the algorithms of multiple modules.
Automation: operator extraction and hardware generation
No matter what era, designing hardware is a very expensive thing. A commercially available dedicated hardware acceleration unit requires several months of development, design and verification by the hardware design and testing team before it can be put into use. This is why the iteration speed of in-vehicle intelligent hardware is much slower than the software iteration speed. Moreover, unlike traditional operating systems and other general-purpose software, the iterated version is often compatible with the previous generation or generations of hardware platforms. The iteration of in-vehicle intelligent software is likely to leave the previous generation platform far behind.
Hardware designers can rely on no other choice but agility, speed and automation of development. Traditional hardware generation ( High-Level Synthesis ) has many problems, such as the inability to optimize the underlying multi-core or heterogeneous architecture platform; the inability to understand the core bottleneck of the algorithm, and still requires a lot of manual adjustments and modifications by hardware engineers.
When we have a suitable operator as an intermediary to design hardware, the automation of hardware design becomes closer to reality. This can be divided into the following two parts:
The first part is to extract operators from existing or new algorithms, but this is not easy. Software developers do not consider the underlying hardware when designing algorithms. The only purpose of algorithm design is correctness and efficiency. From the perspective of hardware designers, many algorithms used in autonomous driving modules are messy and disorderly. Extracting a fixed operator from such an algorithm or algorithms and automating this process is very challenging. Taking our research as an example, even after determining that the factor graph is a unified intermediate medium, a large amount of manual design is still required when designing accelerators for different algorithms that use the factor graph as a common medium.
The second part is to automatically generate hardware by operators. In our work [7] , we have begun to initially use automated or semi-automated methods to design hardware for autonomous driving software. As shown in the figure below, it shows the hardware architecture of the back-end of a semi-automatic solution positioning algorithm. The three hardware module decomposition modules of D-Type Shure Elimination, M-Type Shure Elimination and Cholesky are automatically generated hardware.
A semi-automatic solution positioning algorithm back-end hardware architecture
Compared with High-Level Synthesis, we usually first manually design an optimized hardware template for a general operator. This template circuit can be applied to algorithm-specific hardware that uses the operator. At the same time, we can dynamically adjust the hardware design based on the size of the data volume and the complexity of the scenario.
Although there are a lot of automated steps, there is still a lot of work to be done on the road to more agile development of hardware. For example, in addition to the hardware circuits corresponding to general operators, there are also a large number of other operations in algorithms or software, and the circuits corresponding to these operations require us to design them manually. At the same time, the data storage method of dedicated hardware also needs to be manually customized according to the computing mode.
Efficiency or robustness?
Unlike smartphone chips or server chip designs that only focus on efficiency as the main indicator, an important indicator of smart vehicle chips is to complete the required calculations in real time while also having sufficient fault tolerance, that is, the robustness of the system. The robustness of the vehicle system is related to the vital personal safety issue, which is also the key to whether fully autonomous driving can be truly applied.
However, robustness and efficiency are actually contradictory indicators. Traditional fault-tolerant computing usually adds simple logic such as backup resources when ensuring robustness. Whether it is circuit backup in space or repeated calculations in timing, they all try to ensure the results by performing multiple calculations on the same calculation content. Correctness. However, this logic itself violates the efficiency of the computing system. The circuit backup in space increases the chip area and power consumption, and repeated calculations in timing will bring a time burden, which may cause the vehicle to be unable to operate for a long time. Respond to environmental changes within a short delay.
Automotive chip designers such as Tesla and ARM have begun to deploy multi-machine redundant systems in real systems. Among them, Tesla's "fully autonomous driving" system uses two identical pieces of hardware and deploys the same autonomous driving software, and the two serve as backups for each other; ARM's A-series chips designed for autonomous driving also provide a locking option , that is, two CPU cores run the same task, and the output signals of the two CPU cores are compared in each clock cycle. Through the similarities and differences of the output signals, it can be judged whether there is an error in the CPU core. If an error occurs, an alarm will be issued for system design. The person makes a choice.
We hope to build a bridge between efficiency and robustness and provide a solution that not only ensures robustness but also minimizes the loss of efficiency. In this regard, we are inspired by the complexity of software systems [8] .
Autonomous driving software is a complex system with dozens of algorithms in multiple different modules. After different algorithms encounter errors, their feedback to the entire system is also completely different. We found that some algorithms are naturally extremely robust to errors due to their design. For example, for the algorithm in the perception module, because there is the fusion of multiple sensor branches, errors on a single sensor path may be corrected by the perception results of other sensors. Similarly, some algorithms integrate the input signal during the operation process and then add and subtract the previous value to obtain a new result. This type of algorithm also has a high tolerance for errors. When its input signal makes an error, its output The signal may not affect the final operating results of the system.
After mining this information, using the different robustness within the autonomous driving software can help hardware designers significantly reduce the burden on the hardware level while ensuring robustness. For example, when performing redundant calculations, hardware designers can back up part of the hardware and schedule modules ( such as control modules ) that are greatly affected by errors to run on the backup hardware. Modules that are insensitive to errors and have strong fault tolerance can be scheduled to hardware without backup to achieve better running speed.
The next decade of automotive smart chips
After rapid development in the past few years, vehicle-mounted smart chips have received a lot of attention and have been deployed in actual production and life. But whether in academia or industry, there are still quite a few problems that need to be solved for the future development of vehicle-mounted smart chips.
The first is the programmability of on-board smart chips. The same hardware manufacturer may serve different car manufacturers. Generally speaking, the design logic of autonomous driving software is different between different car manufacturers. Hardware designers are motivated to design a common programming model for their own hardware that can be used by different software service providers. This programming model can better tap the hardware computing power while providing an interface for software designers.
Followed by multi-vehicle communication and vehicle-to-road communication. Vehicle-road collaboration and multi-vehicle collaboration are considered a key step to achieve the highest level of autonomous driving. Sharing information with other vehicles and roadside processing units can better assist vehicles in making decisions. Even if hardware designers and software providers can unify interfaces for vehicle-road collaboration and multi-vehicle collaboration, there are still many problems with this idea.
For example, like cloud computing, a key issue in vehicle-vehicle collaboration and vehicle-road collaboration is the privacy of personal data. Are car owners willing to share their vehicle information with others? What kind of security will it cause after sharing? Sexual hazards and whether privacy computing will play an important role are a series of issues that need to be resolved urgently in academia and industry.
Overall, without any extra publicity or marketing, autonomous driving and its hardware design will become the "trend" of our era. However, practitioners still need to come up with new ideas to solve problems such as how to design specialized hardware with high enough efficiency, how to achieve agile development, and how to ensure robustness while taking into account efficiency. As researchers on related topics, we also look forward to working with you to provide answers to these questions.
references
[1] Gan, Yiming, et al. "Eudoxus: Characterizing and accelerating localization in autonomous machines industry track paper." 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021.
[2] Loeliger, Hans-Andrea, et al. "The factor graph approach to model-based signal processing." Proceedings of the IEEE 95.6 (2007): 1295-1322.
[3] Zhang, Yanhao, Teng Zhang, and Shoudong Huang. "Comparison of EKF based SLAM and optimization based SLAM algorithms." 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2018.
[4] Schoellig, Angela P., Fabian L. Mueller, and Raffaello D'andrea. "Optimization-based iterative learning for precise quadrocopter trajectory tracking." Autonomous Robots 33.1 (2012): 103-127.
[5] Rawlings, James B., and Brett T. Stewart. "Coordinating multiple optimization-based controllers: New opportunities and challenges." Journal of process control 18.9 (2008): 839-845.
[6] Hao, Yuhui, et al. "Factor Graph Accelerator for LiDAR-Inertial Odometry." Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design. 2022.
[7] Liu, Weizhuang, et al. "Archytas: A framework for synthesizing and dynamically optimizing accelerators for robotic localization." MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. 2021.
[8] Gan, Yiming, et al. "Braum: Analyzing and protecting autonomous machine software stack." 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2022.
The content of this article is for communication and learning purposes only and does not constitute any investment advice. If you have any questions, please contact us at info@gsi24.com.
Sudden! ASML's export license for some photolithography machines has been revoked
In 2024, good days are coming for semiconductor people?
Computing chips, the end game?
In 2024, please stop calling for domestic chip replacement
Layoffs of 17,000, monthly salary of 18,000...the sorrows and joys of semiconductor people in 2023