The CES show in the United States has long been an important benchmark for people to observe the global technology trend. At the 2024 CES exhibition, two experts in the field of artificial intelligence, Li Feifei and Ng Enda, had a conversation. In this conversation, they mentioned a key point that can affect the autonomous driving industry.
That is, the large AI model has begun to transform from a "large language model" to a "large visual model". The large AI model can not only understand language, but also generate images, and analyze images, allowing computers to better understand the nature of images. meaning, thereby bringing a qualitative leap to autonomous driving.
In this regard, this article is divided into two parts:
Why do Li Feifei and Ng Enda say that the "big vision model" will bring a qualitative leap to autonomous driving?
Why don’t we get too involved in big AI models of autonomous driving?
Will big visual models bring about a disruptive revolution?
When a person drives a vehicle, he or she does not simply hold the steering wheel and control the accelerator and brake, but also handles many complex things.
You have to look at traffic signals, you have to look at various roadside signs, and you have to judge what is on the road. If there's a little duck walking slowly on the road ahead, you have to apply the brakes; but if it's a bird, you can imagine it will fly away as the car drives past, and you don't have to slow down. If there's a plastic bag on the road, you can just run over it; but if it's a rock, you have to go around it.
You have a deep understanding of the road conditions, which is related to your life experience and the experience you accumulate in daily life. At the very least you have to know what a plastic bag is, what a stone is, what a bird is, but the car doesn’t know.
For the car to know these things, the technology involved is too difficult. With the existing pattern recognition capabilities, even if the obstacle is changed from another angle, the computer cannot see it. What's more, people's knowledge of the road is endless. You simply can't tell the computer every knowledge, and it itself has no thinking ability at all.
At present, autonomous driving is all about narrow AI, taking the route of machine learning. The computer treats all objects on the road, including buildings, other cars, and pedestrians, as three-dimensional models, and it no longer tries to understand these objects.
The computer only cares about the movement trend of these objects, estimates the speed of each object, predicts its route, and sees if there will be a conflict with the car's route. If there is a conflict, it will brake or go around.
However, there are all kinds of accidents on the real road. Google has been training self-driving technology, and they have encountered all kinds of strange situations. Once there were some kids playing with frogs on the highway. Another time, a disabled person in an electric wheelchair chased a duck in the middle of the road. The duck ran in a circle, and she also chased it in a circle. So do you think you can accurately predict the course of action of these people in a situation like this? Self-driving cars identify objects on the roadside by shining lasers on various objects and then reflecting them back. However, if it is snowing or raining, the laser may hit the snowflakes or raindrops and reflect, and the car may seriously misjudge the surrounding objects.
Can computers be guaranteed to understand roadside traffic signs marking speed limits and slow traffic? Pattern recognition technology is very difficult. Once, Google's self-driving technology recognized Obama's wife Michelle as an orangutan, which made people laugh. Moreover, if the sign is damaged or has a small advertisement posted on it, the car may not be able to recognize it.
In 2016, a Tesla owner violated regulations and turned the car over to autonomous driving. As a result, the car did not recognize a white truck in front of it. It may have thought it was white clouds in the sky or something else, and the driver died on the spot. Of course, the driver made a mistake, but this also shows that autonomous driving technology is very prone to accidents.
However, the "big visual model" may change all this.
In September 2023, OpenAI released a beta version of GPT-4V, which can understand pictures and interpret e-sports competitions. In other words, GPT has a strong ability to understand various things in images and videos. In the test, GPT-4V made amazing breakthroughs when viewing images and videos of different driving scenes, showing performance beyond the current reality. There is potential for autonomous driving systems.
Moreover, not only recognition data, large models can also generate autonomous driving data. For example, Wayve, a self-driving company from the UK, has made an attempt. They have developed a generative AI model called GAIA-1. People input videos and text, and the AI will create realistic driving videos based on their needs.
GAIA-1 can learn and understand many concepts about driving, including cars, pedestrians, road layout, traffic lights, buildings, etc. It can generate many complex road conditions, which is very helpful for autonomous driving systems that take visual routes.
It is worth mentioning that researchers from UC Berkeley and Johns Hopkins University have proposed a new modeling method that can train large vision models without using any language data.
To put it simply, a large visual model can understand and process complex visual information by simply looking at pictures for training, without relying on language data. It can be seen that the process of large vision models has just begun, and it has huge potential that has yet to be tapped. This is a huge benefit to Tesla’s pure vision solution for autonomous driving.
Why I advise you not to overestimate large models
Today, in the field of autonomous driving, various concepts are emerging one after another. Whenever a new technology is born, someone will exclaim that a new era is about to be born!
But in fact, most people don’t realize that the boundary of autonomous driving is the boundary of artificial intelligence, and the boundary of artificial intelligence is the boundary of mathematics. Yes, mathematics has boundaries.
In 1931, the mathematician Gödel believed that many mathematicians are trying to build a mathematical system that is both complete and consistent. Such efforts are in the wrong direction. A mathematical system cannot be both complete and consistent. In other words, if completeness is ensured, the conclusions will be contradictory; if consistency is ensured, there will be many conclusions that cannot be proved by logical reasoning. This reminds people and lets them know that mathematics is not omnipotent, and many problems in the world are not mathematical problems.
For example, you are driving at a fast speed and suddenly find a group of elementary school students in front of you having a fight on the road. To avoid these schoolchildren, you will hit the wall of the building on the roadside, and if you hit the wall, your life will be in danger. In this situation, would you choose to hit the wall or the primary school student?
If a car manufacturer tells you that our car is ethical and that our autonomous driving system will first ensure the safety of pedestrians in this situation, would you buy such a car?
Will you let the car make the decision at your expense? It can be seen that this is an ethical question with no standard answer. No matter how powerful artificial intelligence is, it cannot calculate such a question.
Secondly, in many cases, no matter what model is used or how powerful the computing power is, it cannot be calculated.
In mathematics, there is another classic proposition. In 1900, the mathematician Hilbert raised a question: For a certain type of mathematical problem, is there a method that can determine whether it has a solution through a finite number of steps? According to the conclusion given by Silber, even if there are algorithms for many mathematical problems, it is not known whether they have solutions.
In fact, autonomous driving falls into this type of problem. We don’t know whether there is a solution.
Today, all experts are saying that as long as there is enough data, large autonomous driving models will mature sooner or later. In fact, for autonomous driving systems, in most cases, 2% of the data can be used to train a model that can solve the problem of 80% of the road surface. The autonomous driving system can solve % of the situations, but for the remaining 20%, no matter how much data you use, it may not be able to solve it.
For example, Musk's pure vision FSDV12. In imagination, the pure vision solution has ready-made AI algorithms that can be imitated, but in the actual mass production process, there are countless details that need to be improved. In imagination, as long as the algorithm is logically perfect, it will be enough. , but in fact the algorithm requires large-scale data feeding.
You know, Musk has devoted countless resources to Tesla FSD. For example, during the development process of FSD, Tesla has accumulated more than 9 billion miles of use, which is the world's largest source of autonomous driving data; in order to use this data , Tesla continues to expand its supercomputing cluster, poach top AI engineers everywhere, and develop self-developed algorithms, chips and high-power GPUs.
But even so, you may not be able to feed it. You know, Musk has publicly stated that he underestimated the difficulty of a purely visual solution, and he is very sorry.
Why is this happening? For example, all 50 states in the United States have their own traffic regulations. The climate conditions and road conditions are different in different places, not to mention the differences between the United States and China. What does this mean? This means that an autonomous driving solution trained in one area is completely useless in another place. Therefore, any large-scale autonomous driving model has great limitations and cannot be universal. You must collect a large amount of data in every region.
Previous article:Huawei: Ten major trends in charging networks
Next article:Core technologies of “big three electrics” and “small three electrics” of new energy vehicles
- Popular Resources
- Popular amplifiers
- A new chapter in Great Wall Motors R&D: solid-state battery technology leads the future
- Naxin Micro provides full-scenario GaN driver IC solutions
- Interpreting Huawei’s new solid-state battery patent, will it challenge CATL in 2030?
- Are pure electric/plug-in hybrid vehicles going crazy? A Chinese company has launched the world's first -40℃ dischargeable hybrid battery that is not afraid of cold
- How much do you know about intelligent driving domain control: low-end and mid-end models are accelerating their introduction, with integrated driving and parking solutions accounting for the majority
- Foresight Launches Six Advanced Stereo Sensor Suite to Revolutionize Industrial and Automotive 3D Perception
- OPTIMA launches new ORANGETOP QH6 lithium battery to adapt to extreme temperature conditions
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions
- TDK launches second generation 6-axis IMU for automotive safety applications
- LED chemical incompatibility test to see which chemicals LEDs can be used with
- Application of ARM9 hardware coprocessor on WinCE embedded motherboard
- What are the key points for selecting rotor flowmeter?
- LM317 high power charger circuit
- A brief analysis of Embest's application and development of embedded medical devices
- Single-phase RC protection circuit
- stm32 PVD programmable voltage monitor
- Introduction and measurement of edge trigger and level trigger of 51 single chip microcomputer
- Improved design of Linux system software shell protection technology
- What to do if the ABB robot protection device stops
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Download from the Internet--ARM Getting Started Notes
- Learn ARM development(22)
- Learn ARM development(21)
- Learn ARM development(20)
- Learn ARM development(19)
- Learn ARM development(14)
- Learn ARM development(15)
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- What is the chip with 4BMN silk screen? It has 5 pins.
- Can I ask the experts, can a sine wave be converted into a square wave only by using an adder?
- Next-generation DOCSIS 4.0 products, CATV amplifiers based on gallium nitride (GaN) technology
- ④. Drive five-wire four-phase stepper motor
- Bluetooth Protocol
- I encountered a problem when testing the CAN communication isolation chip a few days ago. I hope you can give me some advice.
- Reminiscing about the past! A brief discussion on the century-long history of radio development
- 【ESP32-C3-DevKitM-1】WIFI+SNTP time acquisition of ESP32-C3
- [Qinheng RISC-V core CH582] UART1 send packet loss test
- Learn embedded linux c programming from practice