Domestic AI Director Thief 6, short video lenses and objects move in different ways｜Hong Kong City University & Kuaishou & Tianda University

Latest update time：2024-02-11

Reads：

‍ The west wind comes from Ao Fei Temple
Qubit | Official account QbitAI

Kuaishou focuses on AI video and participated in the development of an intelligent “director”.

Direct-a-Video successfully decouples object movement and camera movement in AI-generated videos , greatly enhancing flexibility and controllability!

If you don’t believe it, come and enjoy some of the works.

The direction of camera movement in the short video depends entirely on the director's instructions. Horizontal (X-axis), vertical (Y-axis), and zoom must be precise:

The AI director also performed a dazzling stunt, moving the camera in a mix of horizontal and vertical directions:

Mixed level, zoom motion effects are also available

In addition, the director also requires each "actor" in the video to move according to the drawn frame:

Achieve the effect of integrating camera movement and actor movement.

For example, when Big Bear is walking in space, the camera moves horizontally and vertically to achieve the overall video motion effect:

Of course, the position of the big bear can also be moved from one place to another by drawing a frame with arrows:

You can even control the movement paths of multiple "actors" at the same time:

This is the effect of the Direct-a-Video text-video generation framework jointly proposed by the research team of City University of Hong Kong, Kuaishou Technology, and Tianjin University .

How did you do it?

Specifically, Direct-a-Video is divided into two sections -

In the training phase, camera movement control is learned; in the inference phase, object motion control is implemented.

When implementing camera movement control, the researchers adopted the pre-trained ZeroScope text-to-video model as the base model and introduced a new trainable temporal self-attention layer (camera module) , which will be the pan and zoom parameters mapped by Fourier encoding and MLP Embedding is injected into it.

The training strategy is to use data augmentation self-supervised training method to learn the camera module on limited data, without manual motion labeling .

Data augmentation, in layman's terms, means adding a slightly modified version of existing data, or creating new synthetic data from existing data to increase the amount of data:

After self-supervised training, this module can analyze camera motion parameters to achieve quantitative control.

When realizing object motion control, no additional data sets and training are required . Users only need to simply draw the first and last frame frames and the intermediate trajectory to define the object motion.

To put it simply, pixel-based self-attention enhancement and suppression are directly used during inference, and the self-attention distribution of each object in each frame is controlled in time-sharing stages, so that the object is generated to the position specified by the user through a series of boxes, and the object is realized Motion trajectory control.

It is worth mentioning that camera movement control and object movement control are independent of each other , allowing separate or joint control.

How effective is Direct-a-Video?

The researchers verified the effectiveness of the method by comparing Direct-a-Video with multiple benchmarks.

Camera movement control evaluation

The comparison results between Direct-a-Video and AnimateDiff and VideoComposer are as follows:

Direct-a-Video is better than the baseline in terms of generation quality and camera movement control accuracy:

Object motion control assessment

Direct-a-Video was compared with VideoComposer and Peekaboo to verify the control capability of this method in multi-object and motion scenes.

Better than VideoComposer in terms of generation quality and object motion control accuracy:

When netizens saw the effect, they called Imhamestine:

In addition to Runway, there is a new option.

PS:

Runway Gen-2 "Motion Brush" (Motion Brush) , paint wherever you move, and you can also adjust the parameters to control the movement direction:

Reference links:
[1]https://x.com/dreamingtulpa/status/1756246867711561897?s=20
[2]https://arxiv.org/abs/2402.03162

-over-

Click here ???? Follow me and remember to star~

Three consecutive clicks of "Share", "Like" and "Watching"

Advances in cutting-edge science and technology are seen every day ~

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research

Domestic AI Director Thief 6, short video lenses and objects move in different ways｜Hong Kong City University & Kuaishou & Tianda University

‍ The west wind comes from Ao Fei Temple Qubit | Official account QbitAI

How did you do it?

How effective is Direct-a-Video?

Latest articles about

‍ The west wind comes from Ao Fei Temple
Qubit | Official account QbitAI