Even scarier than AI video face-swapping! DeepMind's new AI can generate realistic videos

Publisher:EEWorld资讯Latest update time:2019-07-25 Source: EEWORLDKeywords:DeepMind Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere


Recently, DeepMind researchers developed an artificial intelligence model called Dual Video Discriminator GAN (DVD-GAN), which can generate highly realistic and coherent 256 x 256 pixel videos of up to 48 frames by learning a series of YouTube video datasets.


At present, the research results of DVD-GAN have been published on arxiv on July 15, 2019, US time, titled "Efficient Video Generation on Complex Datasets".



It is more difficult for AI to fake videos than pictures


Recently, FaceApp developed by Russian AI researchers has become a big hit. This app can change the age, appearance, hair color and gender of users' selfies through artificial intelligence technology, and can even generate photos of fictional characters. This allows people to experience the fun that artificial intelligence technology brings to our lives up close.


But has anyone ever thought that these technologies could one day be applied to video?


If BigGAN is an image generator developed by DeepMind in the image field that can generate highly realistic images, then DVD-GAN developed by DeepMind researchers is the latest breakthrough in artificial intelligence in the field of video clip generation.


Generating natural videos is a major challenge for generative modeling, and is also plagued by increased data complexity and computational requirements, the researchers said in the paper.


Therefore, previous researchers in the industry have almost always focused on relatively simple datasets or used limited temporal information to reduce the complexity of the task when studying the field of video generation.


This time, DeepMind researchers focused on the tasks of video synthesis and video prediction, extending the powerful functions and realistic effects of the generative image model to the video field.


DVD-GAN: Based on the BigGAN model structure


The researchers built a DVD-GAN system based on the BigGAN model structure and introduced a series of adjustments for video generation, enabling DVD-GAN to be trained on Kinetics-600.


Kinetics-600 is a training dataset compiled from 500,000 10-second high-resolution YouTube video clips, originally made for recognizing human actions, and is an order of magnitude larger than other commonly used corpora.


At the same time, the researchers said Kinetics-600's diverse features can eliminate their concerns about overfitting, which mainly refers to the phenomenon that the model selected in machine learning contains too many parameters, resulting in the model predicting known data well but predicting unknown data poorly.


On the other hand, DeepMind researchers use generative adversarial learning to provide a learning signal that can generate actions.

In addition, DVD-GAN has a separate Transformer module that allows learning information to propagate within the integrated AI model.



Training takes 12 to 96 hours to generate videos


The research paper shows that after training for 12 to 96 hours on Google's third-generation TPU, DVD-GAN can successfully generate videos that contain the composition and movement of objects, as well as various complex textures.

The downside is that the video content generated by DVD-GAN is sometimes rather "weird", for example, the generated objects and human figures are strange in shape, and even the human body changes in length.


But the researchers noted that when DVD-GAN was evaluated on UCF-101, a smaller dataset of 13,320 human action videos, the highest initialization score for samples generated by DVD-GAN was 32.97.

DeepMind researchers hope to further highlight the benefits of training generative models on large and complex video datasets, such as Kinetics-600.


“We envision DVD-GAN establishing a strong baseline on this dataset that will be used as a reference point for future generative modeling efforts,” the researchers said. “While much work remains to be done to consistently generate realistic videos in unconstrained settings, we believe DVD-GAN is an important step in that direction.”

A Generative Adversarial Network (GANs) is used to distinguish generated samples from real-world samples. The network mainly consists of two parts: a generator and a discriminator.


GANs have been used in tasks such as converting text into scenes or generating artificial galaxy images. The researchers used a generative adversarial network called BigGANs, which is named for its large batch size and millions of parameters.


It is worth mentioning that DVD-GAN contains two discriminators. One is the spatial discriminator (Spatial Discriminator: D_S), which evaluates the content and structure of a single frame by randomly sampling full-resolution frames and processing them separately; the other is the temporal discriminator (Temporal Discriminator: D_T), which can provide a learning signal that can generate actions. 


In addition, DVD-GAN has a separate Transformer module that allows learning information to propagate within the integrated AI model.


Conclusion: An attempt to generate highly realistic videos using AI


Whether it is BigGAN or FaceApp, researchers in the past have conducted many groundbreaking studies in the field of artificial intelligence generated images, but in the field of video, apart from the fact that AI face-changing was once popular, there have not been many more breakthroughs.


DVD-GAN, developed by DeepMind researchers based on the BigGAN architecture and the Kinetics-600 training dataset, uses computationally efficient discriminator decomposition to extend to longer and higher resolution videos. For now, although this achievement is still a little bit imperfect, it is undoubtedly an important attempt by researchers to use AI to generate highly realistic videos.


Keywords:DeepMind Reference address:Even scarier than AI video face-swapping! DeepMind's new AI can generate realistic videos

Previous article:Analysis on the Prospect of Fingerprint Recognition Technology
Next article:WPG World Peace Edge Computing Face Recognition Solution

Latest Internet of Things Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号