53 frames become 900 frames! AI allows you to create slow motion without expensive high-speed cameras, from Huawei｜CVPR 2021

Latest update time：2021-09-15

Reads：

Fengse from Aofei Temple
Quantum Bit Report | Public Account QbitAI

To make a slow motion video, you must use an expensive high-speed camera ?

NO! You can use AI.

See? This is the effect achieved by AI !

Although it cannot compare with the thousands of frames of a real high-speed camera, it can easily convert a 53-frame- per-second video into 960 frames without artifacts or noise.

Many netizens couldn't help but say after seeing the results: "I really want an in-depth tutorial", "Can you make an app?"...

This cool research result was successfully selected for CVPR 2021. The researchers are from Huawei Zurich Research Center and University of Zurich .

Of course, a special camera was also used

To achieve this effect, the classic idea of guessing particle motion through video optical flow was not used. Instead, two cameras were used to capture the image first.

One is a regular camera that records low-frame (20-60FPS) real images;

To achieve a slow motion effect, at least 300 frames per second are needed; a 20-frame video provides too little information to be directly synthesized into slow motion.

What to do? Rely on another special camera——

That is, an event camera (also called a neuromorphic camera) uses a new type of sensor to capture "events", that is, to record changes in pixel brightness .

Event cameras are still relatively new. There are many in laboratories, but they have not yet been launched on the market on a large scale. The price is $2,000 or more each.

Because the information recorded by the camera is compressed, it can shoot at lower resolution and high rate, which means sacrificing image quality in exchange for more image information.

The final amount of information is enough for AI to understand the movement of particles and facilitate subsequent interpolation.

The question mark part is the interpolation frame we want

The two cameras synchronized and the combined content captured is like this:

After taking the photos, you can use machine learning to maximize the information from both cameras for interpolation.

The AI model proposed by the researchers here is called Time Lens, which is divided into four parts .

First, the frame information and event information captured by the two cameras are sent to the first two modules: the warp- based interpolation module and the synthetic interpolation module.

The deformation-based interpolation module utilizes a U-shaped network to convert motion into optical flow representation and then converts events into real frames.

The synthesis interpolation module also uses the U-shaped network to place events between two frames and directly generate a new possible frame for each event (now two frames are generated for the same event) .

This module handles new objects appearing between frames and lighting changes (such as water reflections) very well .

However, at this point, the synthesized video may have a problem: noise .

This is where the third module comes in handy, which uses the new information from the second interpolation synthesis module to refine the first module.

That is, extract the most valuable information from the two generated frames of the same event and perform deformation optimization - use the U-net network again to generate a third frame version of the event .

Finally, these three candidate frames are input into an attention-based averaging module.

This module takes the best of the three frame representations and composes them into a final frame.

Now that we have a high-resolution frame of the first event between frames, repeating this process for all events provided by the event camera will produce the final result we want.

So that's how you can create realistic slow-motion videos using AI. How?

Attached is a parameter diagram of a camera:

Achieved results that cannot be achieved by smartphones and other models

You say this AI model is effective, but we have to compare it to know for sure.

For example, the comparison above with one of the best interpolation models, DAIN (selected for CVPR 19) , shows which one is better.

And the computational complexity of its interpolation method is also optimal: for an image resolution of 640×480, when performing a single interpolation on the researchers' GPU, the DAIN model takes 878 milliseconds, while the AI only takes 138 milliseconds .

Additionally, although it is not recommended, the model can generate slow motion even if the video input is only 5 frames .

For comparative experimental data with other models, those who are interested can refer to the paper.

Finally, the author once again said in the video introducing the results that, unlike expensive professional equipment, this model at least achieved results that smartphones and other models cannot achieve.

about the author

First author Stepan Tulyakov is a machine learning researcher at Huawei Research Center in Zurich.

Co-first author Daniel Gehrig is a PhD student at the University of Zurich and a master’s degree in mechanical engineering at ETH Zurich.

Paper address:
http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf

Open source address:
https://github.com/uzh-rpg/rpg_timelens

Reference Links:

[1] https://www.louisbouchard.ai/timelens/

[2] https://www.reddit.com/r/MachineLearning/comments/pm6s6h/news_make_slow_motion_videos_with_ai_timelens/

[3]https://www.youtube.com/watch?v=dVLyia-ezvo

-over-

This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.

Closed-door salon recruitment: Eat, drink and talk about CG

On September 18 (this Saturday), ZhenFund x QuantumBit will jointly hold a closed-door exchange salon on "Computer Graphics" .

At present, the confirmed academic and industry guest representatives include: Peking University professor Chen Baoquan , Microsoft Research Asia , Dr. Tong Xin , OPPO Chief Scientist of Perceptual Intelligence, Guo Yandong , founder of Taiji Graphics, Hu Yuanming , Auburn Future CEO Lei Yu , Zhen Fund's related field investor, and Quantum Bit Technology Editor. There are still a few places available for the conference, and friends who are interested in computer graphics are sincerely invited to scan the code to sign up!