Domestic AI Director Thief 6, short video lenses and objects move in different ways|Hong Kong City University & Kuaishou & Tianda University
Qubit | Official account QbitAI
The west wind comes from Ao Fei Temple
Kuaishou focuses on AI video and participated in the development of an intelligent “director”.
Direct-a-Video successfully decouples object movement and camera movement in AI-generated videos , greatly enhancing flexibility and controllability!
If you don’t believe it, come and enjoy some of the works.
The direction of camera movement in the short video depends entirely on the director's instructions. Horizontal (X-axis), vertical (Y-axis), and zoom must be precise:
The AI director also performed a dazzling stunt, moving the camera in a mix of horizontal and vertical directions:
Mixed level, zoom motion effects are also available
In addition, the director also requires each "actor" in the video to move according to the drawn frame:
Achieve the effect of integrating camera movement and actor movement.
For example, when Big Bear is walking in space, the camera moves horizontally and vertically to achieve the overall video motion effect:
Of course, the position of the big bear can also be moved from one place to another by drawing a frame with arrows:
You can even control the movement paths of multiple "actors" at the same time:
This is the effect of the Direct-a-Video text-video generation framework jointly proposed by the research team of City University of Hong Kong, Kuaishou Technology, and Tianjin University .
How did you do it?
Specifically, Direct-a-Video is divided into two sections -
In the training phase, camera movement control is learned; in the inference phase, object motion control is implemented.
When implementing camera movement control, the researchers adopted the pre-trained ZeroScope text-to-video model as the base model and introduced a new trainable temporal self-attention layer (camera module) , which will be the pan and zoom parameters mapped by Fourier encoding and MLP Embedding is injected into it.
The training strategy is to use data augmentation self-supervised training method to learn the camera module on limited data, without manual motion labeling .
Data augmentation, in layman's terms, means adding a slightly modified version of existing data, or creating new synthetic data from existing data to increase the amount of data:
After self-supervised training, this module can analyze camera motion parameters to achieve quantitative control.
When realizing object motion control, no additional data sets and training are required . Users only need to simply draw the first and last frame frames and the intermediate trajectory to define the object motion.
To put it simply, pixel-based self-attention enhancement and suppression are directly used during inference, and the self-attention distribution of each object in each frame is controlled in time-sharing stages, so that the object is generated to the position specified by the user through a series of boxes, and the object is realized Motion trajectory control.
It is worth mentioning that camera movement control and object movement control are independent of each other , allowing separate or joint control.
How effective is Direct-a-Video?
The researchers verified the effectiveness of the method by comparing Direct-a-Video with multiple benchmarks.
Camera movement control evaluation
The comparison results between Direct-a-Video and AnimateDiff and VideoComposer are as follows:
Direct-a-Video is better than the baseline in terms of generation quality and camera movement control accuracy:
Object motion control assessment
Direct-a-Video was compared with VideoComposer and Peekaboo to verify the control capability of this method in multi-object and motion scenes.
Better than VideoComposer in terms of generation quality and object motion control accuracy:
When netizens saw the effect, they called Imhamestine:
In addition to Runway, there is a new option.
PS:
Runway Gen-2 "Motion Brush" (Motion Brush) , paint wherever you move, and you can also adjust the parameters to control the movement direction:
Reference links:
[1]https://x.com/dreamingtulpa/status/1756246867711561897?s=20
[2]https://arxiv.org/abs/2402.03162
-over-
Click here ???? Follow me and remember to star~