Article count:10350 Read by:146647018

Account Entry

The Chinese version of GPT-4o explodes: China's first streaming multimodal interaction model, real-time and smooth on-site

Latest update time:2024-07-05
    Reads:
Jin Lei from WAIC
Quantum Bit | Public Account QbitAI

Before GPT-4o was released, SenseTime released "Her" first !

Just now, SenseTime held a live show that blew up the show. Without further ado, let’s take a look at the results:

After a series of live shows, the audience burst into applause and exclaimed "Wow".

This is the effect achieved by 5o in the 5.5 series released by SenseTime, the first streaming native multimodal interaction model in China with 600 billion parameters .

It is understood that this is a new AI interaction mode that incorporates all modalities such as text, sound, images and videos, allowing AI to communicate with people more vividly and richly.

This new AI, named Vimi , is the first large model for controllable character video generation based on the new 5.5 capabilities of RiRi .

And it only needs one photo of any style to complete the task, and it can be used by ordinary users, and it lasts up to 1 minute ~

You know, "controllable characters" has always been a difficult problem when using large models. Even large models such as Sora face problems such as inability to accurately control movements and unstable continuity (sudden changes in face) .

But Vimi is different. It can not only precisely control the character's facial expressions, but also adjust the character's natural posture within the scope of the bust.

It can also automatically generate changes in hair, clothing, and background that match the characters; the duration can even reach minutes.

Therefore, if you want to create your own blockbuster in the future, such as the Snow Queen, it will only take one photo: