Event camera + AI video generation, HKU CUBE framework selected by ICIP, controllable video generation without training
CUBE team contribution
Quantum Bit | Public Account QbitAI
In this era of information explosion, how can we make AI-generated videos more creative and meet specific needs?
The latest research from the University of Hong Kong, "CUBE, an event-based, training-free controllable video generation framework", brings a new solution.
This framework takes advantage of the ability of event cameras to capture dynamic edges, bringing AI-generated videos into a new dimension, with both accuracy and efficiency. The original title of the paper is "Controllable Unsupervised Event-based Video Generation".
It was published at the image processing conference ICIP and selected as an oral report, and was invited to speak at the WACV workshop.
What is an event camera?
Before we delve into the CUBE framework, let’s first get to know the event camera.
Different from the timed capture of traditional cameras, event cameras imitate the biological visual system and only capture the "events" of changes in pixel brightness , just like recording only the essence of the picture.
This can not only effectively reduce redundant data , but also significantly reduce energy consumption .
Especially in scenes with high-speed dynamics or large changes in light, event cameras have more advantages than traditional cameras. And these unique "event data" are the core of the CUBE framework.
△ Left: taken with a normal camera; Right: taken with an event camera
Simply put, event cameras are different from ordinary cameras in that they capture dynamic details at the edges of objects, just like the flash of inspiration in your mind, saving a lot of bandwidth and power.
The CUBE framework combines these "flash" edge data with text descriptions to synthesize videos that meet your needs without training! This not only makes the scenes you generate more "appetizing", but also increases the video quality, time consistency, and text matching.
Why use CUBE?
Other methods either require a lot of training data or have poor generation results. The CUBE framework not only solves these problems, but also performs well in multiple indicators.
Whether it is visual effects , text matching , or inter-frame consistency , CUBE performs well.
Think of it this way: CUBE is like an event camera with smart “filters” that make the resulting video not only vivid but also true to the description, such as allowing Iron Man to do the moon dance on the road!
How does the CUBE framework work?
The full name of CUBE is "Controllable, Unsupervised, Based on Events", which literally means a "controllable, untrained, event-based" video generation framework.
It generates videos by extracting edge information from events and combining it with text descriptions provided by users. In terms of methods, CUBE mainly relies on diffusion model generation technology.
The diffusion model generates pictures by adding random noise to images and gradually restoring them, but the team has further optimized the process to allow it to generate videos based on edge data provided by "events."
CUBE's core methodology
1. Edge extraction: The event stream records the trajectory of the object's movement, and CUBE's primary task is to convert these events into edge information. The team designed an edge extraction module that divides the event data into multiple time periods and extracts key spatial locations to form accurate edge maps. These edge maps not only retain the outline of the moving object, but also make the video generation smoother.
2. Video generation:
With edge data, CUBE combines text descriptions to generate videos. Through the gradual restoration process of the diffusion model, multiple image frames that match the description can be generated, and interpolation technology is used to make the video smoother and more consistent. This process does not require a large amount of training data, because CUBE directly calls the pre-trained diffusion model to achieve high-quality generation.
3. Controllability and consistency: The ControlVideo framework is used, which has excellent controllability. It controls the generated video content through text descriptions, so that the generation of each frame meets specific requirements. The combination of ControlVideo and CUBE solves the problem of insufficient consistency in video generation in traditional methods, making the content more vivid and more in line with the description.
CUBE Performance
In experiments, CUBE outperformed existing methods by a large margin, achieving excellent results in multiple metrics including video quality, text matching, and temporal consistency.
Quantitative experiments show that the inter-frame consistency and text matching of CUBE are better than those of ControlNet, ControlVideo and other methods. In addition, the team also conducted user preference tests, and the results showed that participants generally prefer videos generated by CUBE.
Future Outlook
Of course, there is still room for improvement for CUBE. In the future, the team hopes to combine edge information and texture information to make the video more detailed and realistic, while exploring its applicability in more fields and even applying it in real-time scenarios. This technology is not only suitable for fields such as film and animation generation, but can also be used in scenes such as autonomous driving and monitoring that require rapid recognition of dynamic environments.
CUBE is not only a technology, but also a new exploration in the field of event cameras and AI-generated videos.
If you are also interested in AI-generated videos, you can refer to the full paper and open source code for further reference.
Paper address:
https://ieeexplore.ieee.org/abstract/document/10647468
Code is open source:
https://github.com/IndigoPurple/cube
Please send your submissions to:
ai@qbitai.com
Please indicate [Submission] in the title and tell us:
Who are you, where are you from, what is your contribution
Attach the link to the paper/project homepage and contact information
We will (try to) reply you promptly
Click here ???? Follow me, remember to mark the star~