UniAD large model paves the way for smart cars to enter the AGI era-EEWORLD

Collect

At the recently concluded Beijing Auto Show, in addition to the debut of a number of star models, supply chain companies also began to show off their strength, especially the end-to-end big model , which set off a new wave of enthusiasm for the new generation of intelligent driving technology stack.

As the first company to propose a universal model of integrated perception, decision-making and autonomous driving , SenseTime also demonstrated the road test performance of UniAD (Unified Autonomous Driving), an end-to-end autonomous driving solution for mass production, to the public for the first time at this auto show.

UniAD makes intelligent driving "human-like"

I believe many users have noticed that since this year, almost all car companies or industry chain companies have used the adjective "comparable to real driving" when promoting intelligent driving. One of the important directions of the emergence of end-to-end large models is to solve the problem of "humanized driving".

At the Beijing Auto Show, SenseTime Jueying demonstrated the results of actual road tests without high-precision maps and relying only on visual perception: whether it is urban roads or wireless rural roads, vehicles equipped with UniAD can accurately complete a series of difficult operations such as turning left at a large angle onto a bridge, avoiding vehicles occupying the road and construction areas, and detouring around running pedestrians.

Urban roads are usually complicated, and rural roads are even more unpredictable. For example, you never know when a slow-moving car will block the road, when an electric tricycle will suddenly appear from the side, when there will be road construction or temporary blockage ahead, etc. This kind of driving scenario that is not fixed and needs to rely on the driver's own experience to solve the problem is called a Corner Case in the field of autonomous driving.

Although similar scenarios and challenges may account for less than 10% of the entire driving process, whether such scenarios can be solved is the key to winning the public's trust from positioning "niche technology" and the key to the safety of intelligent driving systems. As urban intelligent driving becomes the main battlefield for high-end intelligent driving competition, the complexity of scene calculations is growing exponentially. A large amount of human resource investment only adds limited rules and cannot cope with an unlimited number of complex scenarios and long-tail road conditions. Based on this, the emergence of end-to-end technology has opened up a new path, transforming the development paradigm of intelligent driving from laying a large amount of manpower to continuous computing power investment and high-quality data input.

At the beginning of this year, Tesla began to push the FSD V12 version of the end-to-end autonomous driving solution to some users. There are also more and more end-to-end intelligent driving solutions in the industry. However, most end-to-end solutions adopt a "two-stage" architecture consisting of two models, perception and decision-making, which is easier to implement. There are still problems of information transmission filtering or loss between the two models. UniAD integrates the perception, decision-making, planning and other modules into a full-stack Transformer end-to-end model, which is an "end-to-end" technology stack that fully integrates perception and decision-making.

With the computing power, high-quality simulation data and model performance of SenseTime's large-scale equipment, UniAD's end-to-end solution has a higher capacity ceiling and strong learning and thinking capabilities; the data-driven end-to-end model has strong generalization capabilities and fast iteration efficiency, which can help car companies open cities quickly and at low cost; the pure visual perception map-free solution further reduces the system's software and hardware costs, helps popularize smart driving, and enables driving across the country. Based on the end-to-end system, SenseTime Jueying also introduced the new generation of autonomous driving large model DriveAGI during this auto show, promoting the transition of autonomous driving from data-driven to cognitive-driven.

The big model landed on Xiaomi SU7, and the smart cockpit was upgraded.

Prior to this, on April 23, SenseTime released the newly upgraded "SenseNova 5.0" large model. It is reported that the 600 billion parameter "SenseNova 5.0" adopts a hybrid expert architecture (MOE), with stronger knowledge, mathematics, reasoning and coding capabilities, becoming the first large model in China that fully matches or even surpasses GPT-4 Turbo, and its multimodal capabilities lead GPT-4V.

Based on the architecture of combining end and cloud, SenseTime's end-side large model greatly surpasses the large model of the same scale, and is comparable to the 7B and 13B large models, which is more suitable for vehicle-side deployment. According to official sources, the recently launched Xiaomi SU7 brought the AI large model into the cockpit, and SenseTime's daily new large model also fully supports Xiao Ai's in-vehicle voice scene application. But this is not all that SenseTime Jueying is working on to upgrade the cockpit experience.

It is reported that SenseTime Jueying draws inspiration from the Apple Vision Pro released last year. With its perception technology and efficient iteration efficiency, it brings two new cockpit 3D interaction demonstrations, 3D Gaze high-precision line of sight interaction and 3D dynamic gesture interaction, to the scene, allowing the audience to experience a more intuitive way of cabin interaction, and promote the evolution of cockpit interaction to safer and more convenient 3D interaction. According to the on-site demonstration, the operator sits in front of a screen that mimics the cockpit, and uses 3D Gaze high-precision line of sight interaction and 3D dynamic gesture interaction, which is similar to Apple Vision Pro.

It is said that this is the world's first smart cockpit technology that can interact with screen icons through line of sight. At present, the use of "non-contact" interaction has become an inevitable trend in the intelligent cockpit interaction revolution, and line of sight interaction is one of the most direct and convenient solutions. However, in the past, due to limitations in accuracy and other reasons, DMS could usually only identify large areas in the cabin such as distraction monitoring, and it was difficult to perform specific interactive actions through it.

SenseTime Jueying is the world's first 3D Gaze high-precision line of sight interaction. By improving the accuracy of the line of sight, it can accurately identify the driver's gaze at the central control screen or the rear-seat user's gaze at a specific icon on the rear screen and interact with it, realizing "what you see is what you choose". Behind this is a set of high-precision three-dimensional eyeball models of "thousands of people, thousands of eyes". SenseTime Jueying uses advanced eye tracking technology and high-precision eye imaging equipment to collect and analyze the driver's eye data, accurately customize a personalized eyeball model for each person, and then based on the high-precision three-dimensional eyeball model of "thousands of people, thousands of eyes", combined with innovative sub-pixel level detail positioning and information fusion technology, it has broken through the problem of line of sight accuracy in the cockpit scene, and combined with gestures, voice and even blinking, it will bring users a smarter and more personalized in-cabin visual interaction experience.

3D dynamic gesture interaction is based on ultra-high-precision 3D hand reconstruction, which can capture, recognize and analyze the user's 3D gestures in real time to achieve the control of vehicle functions. It allows users to interact with the cockpit through gestures. SenseTime Jueying said that it has achieved the cooperation of these two 3D interactive functions in the cabin, just like Vision Pro naked-eye car, which has revolutionized the way of cockpit interaction, freeing users from traditional physical buttons and screen touch, and creating a more natural interactive experience that is more in line with human intuition.

Jueying searches for his own "world"

"The competition for future car intelligence is essentially a competition of integrated applications of general artificial intelligence technology ." This is the view of Wang Xiaogang, co-founder, chief scientist, and president of Jueying Intelligent Vehicle Business Group of SenseTime. In the wave of smart cars, SenseTime Jueying positions itself as a core technology partner that accelerates smart cars into the AGI era, deeply integrates artificial intelligence technology with the automotive industry, builds a general artificial intelligence (AGI) technology architecture of the cockpit-cloud trinity, and creates a diversified product system of smart driving , smart cockpit and AI cloud.

From the perspective of technical strength, SenseTime, backed by SenseTime, has unquestionable technical background and R&D strength, but no matter how good the technology is, it needs strong products to promote implementation and endorsement. In this regard, SenseTime Jueying still needs more powerful partners. As of December 2023, SenseTime Jueying has cooperated with more than 30 domestic and foreign automakers including Honda, BYD , Great Wall, GAC, Hongqi, Zeekr, Nezha, Chery, and Weilai, covering more than 90 models and delivering a total of 1.95 million smart cars.

Among them are high-end models such as the GT, the flagship model of GAC Aion's high-end luxury brand Haobo, as well as mid-range models such as Zeekr X and Chery Jetour Traveler. However, SenseTime Jueying still needs more powerful blockbuster models to make the real consumer market feel it.