How does intelligent driving embrace large models?-EEWORLD

Collect

In a few months, large models represented by ChatGPT have rapidly become popular, with rapid iterations and the potential to sweep across all walks of life. When this hot wind blew into the field of intelligent driving, it brought some panic and also brought new directions.

One is a language model that satisfies generalization and multi-tasking on the cloud, and the other is an intelligent driving system running in a public transportation environment. According to Yin Wei, senior software manager of Zhiji Auto Intelligent Driving Center, the two are the gateway to AGI ( Artificial General Intelligence (artificial general intelligence) prototype has two different paths. ChatGPT starts from the "cloud" and moves toward "trustworthiness", and intelligent driving starts from the "end" and moves toward "universal use."

How will they intersect on the road to AGI? What guiding significance does large models have for the development of intelligent driving? In what direction will intelligent driving evolve in the future?

At the 2023 China (Yizhuang) Intelligent Connected Vehicle Technology Week and the 10th International Intelligent Connected Vehicle Technology Annual Conference (CICV2023), Yin Wei shared his thoughts.

The following is compiled based on the shorthand of Yin Wei’s speech, with slight deletions:

1. Where do large models and intelligent driving intersect?

Yin Wei, senior manager of software at Zhiji Auto Intelligent Driving Center

1) Perceptual fusion prediction

ChatGPT and intelligent driving both belong to the system category, and any system research discusses two issues: one is universality (which means that after a model is trained, it can be applied to new data and make accurate predictions to achieve a wide range of scene coverage). capability), one is reliability. Maintain your uncertainty through versatility and your certainty through reliability.

At present, in the field of intelligent driving, only image perception uses models and other methods use rule algorithms. Now, in addition to regulation and control, perception fusion prediction can now be fully model-based.

Studying the control of the generalization of the entire software by the large car-end model has a very positive significance for the processing of corner cases and is also a development trend.

2) Data engine

Data closed loop is essential for both ChatGPT and smart driving. However, when talking about data closed loop in the past, there were many changes to the model, but recently the focus of discussion has become to see how to use the model to get the results, and then use the results to get the model, and cycle the matryoshka doll.

In fact, ChatGPT is similar. From 1.0 to 4.0, the entire data structure is like the neurons in the brain. The basic structure has not changed much. It is just that the learning materials used to train our brains have changed greatly every time education reforms, making the brain faster.

3)Transformer

Transformer is the T of GPT and is also widely mentioned in the field of intelligent driving. It is a deep learning model that uses the attention mechanism to improve the speed of model training, and consists of two parts: encoding and decoding.

The intelligent driving system and the large model unanimously selected Transformer without any discussion. It may seem like a coincidence, but there are certain objective rules behind it. Transformer has some causal reasoning processes for spatio-temporal logic, and at least under the current circumstances, it is unifying the strategy for the development of the entire intelligent system.

The intelligent driving system is currently in the development strategy of a large encoder. The use of decoders is still under development, but the large model has entered the decoder stage, which is of reference for future Transformer development for intelligent driving.

4) Multimodality

Nowadays, language and text-based large models are popular, and picture, video, and voice-based models are beginning to become popular. The modality that has not yet emerged is behavioral large-scale models, combined with robots. Once the large model reaches the behavioral stage, it begins to enter the unified discussion scope with intelligent driving.

However, the development of intelligent driving systems will be a little different. Now more discussion is about the BEV model for perception such as cameras and lidar. It is also discussed how to use the model to perform some topological mapping of the original high-precision map during the map prediction process. These are important for The planning decisions of intelligent driving are all inputs. In this dimension, the breakthrough of the language model has strong reference significance for the future development of intelligent driving system planning.

5) Equal rights

This term has been mentioned recently by both smart driving and large models, but the equality logic of the two is different.

Talking about equal rights in intelligent driving is mostly related to cost reduction. To ensure cost reduction under high iteration conditions, how to implement the entire centralized architecture, and to reduce the marginal cost of software. When adapting to new models, new algorithms, and new business conditions, it is necessary to ensure that software changes are minimized. Modeling does make a great contribution to this matter.

Of course, it will also bring new problems. If the model input source changes, it may bring about a huge increase in costs.

But starting from the field of large models, equality talks more about the issue of ownership, and the issue of who controls such terrifying productivity.

2. Jumping repeatedly between certainty and uncertainty

To study the workflow status of large models and intelligent driving, it is necessary to understand the processes they go through when dealing with generalization and reliability.

ChatGPT's business is inherently self-explanatory, oriented to multi-task scenarios, and highly fault-tolerant. Its entire training process, from the initial unsupervised learning that requires the most quantity, to structured fine-tuning, supervised learning process, and then to reinforcement learning, the results generated after training are actually ready for use at this stage.

However, if it can be truly used in workflow, it still needs to reach the level of prompt word engineering (using a prompt language that AI can understand to help AI understand requirements efficiently and implement functions) before it can exert its productivity value.

The entire development process of ChatGPT is from generalization and high fault tolerance to values that look a lot like smart cars, such as controlling latency, reducing computing power, and enhancing the authenticity and controllability of interactions. It belongs to a paradigm that emphasizes uncertainty and requires some answers and new thinking from the process. People only give a guideline, but will not forcefully control its results.

But looking at the entire development stage of smart cars and what we want to do later, it is actually the opposite of ChatGPT.

In smart cars, no matter how small the system is at the beginning, it is actually a robot operating in a public transportation environment, which is related to life safety, so safety and reliability must be emphasized. Intelligent driving systems belong to a paradigm that emphasizes determinism. A model must first reach a safety factor before controlling it.

There is a big difference between the two approaches. People who are used to deterministic practices or used to rules, and people who are used to uncertain practices, may not understand each other at all. However, the implementation process of intelligent driving systems is a process of repeatedly jumping between certainty and uncertainty.

The biggest role of deterministic workflow in products is to ensure short-term product quality and provide a guarantee for mass production. At the same time, it also provides the ability to "difference identification" for uncertain workflows. There is a deterministic method or safety system to control the boundaries and give some space for deep learning. This is a solution that is easier to operate in mass production.

Uncertain workflows have a profound impact on the long-term iteration of products, and can help deterministic workflows improve efficiency and relieve stress.

This repeated jump between certainty and uncertainty is a kind of spiral. Changes in the two will bring about a jump in the cognitive dimension of the intelligent driving system.

3. Where will the future jump?

ChatGPT has a high willingness to pay among users, and its products are iterated quickly. However, the commercialization of smart driving is much slower in comparison, which has also led to a lot of capital pouring into ChatGPT from smart driving.

The development trajectories of the two are destined to be different. Although they both reach the end of widespread trust, AIGC represented by ChatGPT has experienced from widespread to widespread trust, while intelligent driving has experienced from trust to widespread trust.

Although AIGC has exploded extremely rapidly during this period, from language, to multi-modality, to customization in professional fields, to the use of many tools, and finally to the intervention of robots, the speed will be very fast, but the development speed of AIGC is also It won't always be so fast, there will always be a day when it slows down.

At what point will it slow down? It will slow down when faced with some of the same problems as intelligent driving. When it comes to decision-making in some highly sensitive areas, such as when some robots enter the public safety field, it will definitely slow down.

There may be three stages between ChatGPT and intelligent driving.

The first stage is panic, which is what it feels like now.

In the second stage, LLM (large language model) business will begin to guide the practice of intelligent driving engineers. Now many intelligent driving workflows are also moving in this direction, from the perception of all things, to prediction, planning, to the process of self-explanation.