Don’t get too into the big model of autonomous driving-EEWORLD

Collect

The CES show in the United States has long been an important benchmark for people to observe the global technology trend. At the 2024 CES exhibition, two experts in the field of artificial intelligence, Li Feifei and Ng Enda, had a conversation. In this conversation, they mentioned a key point that can affect the autonomous driving industry.

Don’t get too into the big model of autonomous driving

That is, the large AI model has begun to transform from a "large language model" to a "large visual model". The large AI model can not only understand language, but also generate images, and analyze images, allowing computers to better understand the nature of images. meaning, thereby bringing a qualitative leap to autonomous driving.

In this regard, this article is divided into two parts:

Why do Li Feifei and Ng Enda say that the "big vision model" will bring a qualitative leap to autonomous driving?

Why don’t we get too involved in big AI models of autonomous driving?

Will big visual models bring about a disruptive revolution?

When a person drives a vehicle, he or she does not simply hold the steering wheel and control the accelerator and brake, but also handles many complex things.

You have to look at traffic signals, you have to look at various roadside signs, and you have to judge what is on the road. If there's a little duck walking slowly on the road ahead, you have to apply the brakes; but if it's a bird, you can imagine it will fly away as the car drives past, and you don't have to slow down. If there's a plastic bag on the road, you can just run over it; but if it's a rock, you have to go around it.

You have a deep understanding of the road conditions, which is related to your life experience and the experience you accumulate in daily life. At the very least you have to know what a plastic bag is, what a stone is, what a bird is, but the car doesn’t know.

For the car to know these things, the technology involved is too difficult. With the existing pattern recognition capabilities, even if the obstacle is changed from another angle, the computer cannot see it. What's more, people's knowledge of the road is endless. You simply can't tell the computer every knowledge, and it itself has no thinking ability at all.

At present, autonomous driving is all about narrow AI, taking the route of machine learning. The computer treats all objects on the road, including buildings, other cars, and pedestrians, as three-dimensional models, and it no longer tries to understand these objects.

Don’t get too into the big model of autonomous driving

The computer only cares about the movement trend of these objects, estimates the speed of each object, predicts its route, and sees if there will be a conflict with the car's route. If there is a conflict, it will brake or go around.

However, there are all kinds of accidents on the real road. Google has been training self-driving technology, and they have encountered all kinds of strange situations. Once there were some kids playing with frogs on the highway. Another time, a disabled person in an electric wheelchair chased a duck in the middle of the road. The duck ran in a circle, and she also chased it in a circle. So do you think you can accurately predict the course of action of these people in a situation like this? Self-driving cars identify objects on the roadside by shining lasers on various objects and then reflecting them back. However, if it is snowing or raining, the laser may hit the snowflakes or raindrops and reflect, and the car may seriously misjudge the surrounding objects.

Can computers be guaranteed to understand roadside traffic signs marking speed limits and slow traffic? Pattern recognition technology is very difficult. Once, Google's self-driving technology recognized Obama's wife Michelle as an orangutan, which made people laugh. Moreover, if the sign is damaged or has a small advertisement posted on it, the car may not be able to recognize it.

In 2016, a Tesla owner violated regulations and turned the car over to autonomous driving. As a result, the car did not recognize a white truck in front of it. It may have thought it was white clouds in the sky or something else, and the driver died on the spot. Of course, the driver made a mistake, but this also shows that autonomous driving technology is very prone to accidents.

However, the "big visual model" may change all this.

In September 2023, OpenAI released a beta version of GPT-4V, which can understand pictures and interpret e-sports competitions. In other words, GPT has a strong ability to understand various things in images and videos. In the test, GPT-4V made amazing breakthroughs when viewing images and videos of different driving scenes, showing performance beyond the current reality. There is potential for autonomous driving systems.

Don’t get too into the big model of autonomous driving

Moreover, not only recognition data, large models can also generate autonomous driving data. For example, Wayve, a self-driving company from the UK, has made an attempt. They have developed a generative AI model called GAIA-1. People input videos and text, and the AI will create realistic driving videos based on their needs.

GAIA-1 can learn and understand many concepts about driving, including cars, pedestrians, road layout, traffic lights, buildings, etc. It can generate many complex road conditions, which is very helpful for autonomous driving systems that take visual routes.

It is worth mentioning that researchers from UC Berkeley and Johns Hopkins University have proposed a new modeling method that can train large vision models without using any language data.

To put it simply, a large visual model can understand and process complex visual information by simply looking at pictures for training, without relying on language data. It can be seen that the process of large vision models has just begun, and it has huge potential that has yet to be tapped. This is a huge benefit to Tesla’s pure vision solution for autonomous driving.

Why I advise you not to overestimate large models

Today, in the field of autonomous driving, various concepts are emerging one after another. Whenever a new technology is born, someone will exclaim that a new era is about to be born!

But in fact, most people don’t realize that the boundary of autonomous driving is the boundary of artificial intelligence, and the boundary of artificial intelligence is the boundary of mathematics. Yes, mathematics has boundaries.

In 1931, the mathematician Gödel believed that many mathematicians are trying to build a mathematical system that is both complete and consistent. Such efforts are in the wrong direction. A mathematical system cannot be both complete and consistent. In other words, if completeness is ensured, the conclusions will be contradictory; if consistency is ensured, there will be many conclusions that cannot be proved by logical reasoning. This reminds people and lets them know that mathematics is not omnipotent, and many problems in the world are not mathematical problems.

For example, you are driving at a fast speed and suddenly find a group of elementary school students in front of you having a fight on the road. To avoid these schoolchildren, you will hit the wall of the building on the roadside, and if you hit the wall, your life will be in danger. In this situation, would you choose to hit the wall or the primary school student?

Don’t get too into the big model of autonomous driving

If a car manufacturer tells you that our car is ethical and that our autonomous driving system will first ensure the safety of pedestrians in this situation, would you buy such a car?

Will you let the car make the decision at your expense? It can be seen that this is an ethical question with no standard answer. No matter how powerful artificial intelligence is, it cannot calculate such a question.

Secondly, in many cases, no matter what model is used or how powerful the computing power is, it cannot be calculated.

In mathematics, there is another classic proposition. In 1900, the mathematician Hilbert raised a question: For a certain type of mathematical problem, is there a method that can determine whether it has a solution through a finite number of steps? According to the conclusion given by Silber, even if there are algorithms for many mathematical problems, it is not known whether they have solutions.

In fact, autonomous driving falls into this type of problem. We don’t know whether there is a solution.

Today, all experts are saying that as long as there is enough data, large autonomous driving models will mature sooner or later. In fact, for autonomous driving systems, in most cases, 2% of the data can be used to train a model that can solve the problem of 80% of the road surface. The autonomous driving system can solve % of the situations, but for the remaining 20%, no matter how much data you use, it may not be able to solve it.

For example, Musk's pure vision FSDV12. In imagination, the pure vision solution has ready-made AI algorithms that can be imitated, but in the actual mass production process, there are countless details that need to be improved. In imagination, as long as the algorithm is logically perfect, it will be enough. , but in fact the algorithm requires large-scale data feeding.

You know, Musk has devoted countless resources to Tesla FSD. For example, during the development process of FSD, Tesla has accumulated more than 9 billion miles of use, which is the world's largest source of autonomous driving data; in order to use this data , Tesla continues to expand its supercomputing cluster, poach top AI engineers everywhere, and develop self-developed algorithms, chips and high-power GPUs.

But even so, you may not be able to feed it. You know, Musk has publicly stated that he underestimated the difficulty of a purely visual solution, and he is very sorry.

Don’t get too into the big model of autonomous driving

Why is this happening? For example, all 50 states in the United States have their own traffic regulations. The climate conditions and road conditions are different in different places, not to mention the differences between the United States and China. What does this mean? This means that an autonomous driving solution trained in one area is completely useless in another place. Therefore, any large-scale autonomous driving model has great limitations and cannot be universal. You must collect a large amount of data in every region.