Recently, DeepMind researchers developed an artificial intelligence model called Dual Video Discriminator GAN (DVD-GAN), which can generate highly realistic and coherent 256 x 256 pixel videos of up to 48 frames by learning a series of YouTube video datasets.
At present, the research results of DVD-GAN have been published on arxiv on July 15, 2019, US time, titled "Efficient Video Generation on Complex Datasets".
It is more difficult for AI to fake videos than pictures
Recently, FaceApp developed by Russian AI researchers has become a big hit. This app can change the age, appearance, hair color and gender of users' selfies through artificial intelligence technology, and can even generate photos of fictional characters. This allows people to experience the fun that artificial intelligence technology brings to our lives up close.
But has anyone ever thought that these technologies could one day be applied to video?
If BigGAN is an image generator developed by DeepMind in the image field that can generate highly realistic images, then DVD-GAN developed by DeepMind researchers is the latest breakthrough in artificial intelligence in the field of video clip generation.
Generating natural videos is a major challenge for generative modeling, and is also plagued by increased data complexity and computational requirements, the researchers said in the paper.
Therefore, previous researchers in the industry have almost always focused on relatively simple datasets or used limited temporal information to reduce the complexity of the task when studying the field of video generation.
This time, DeepMind researchers focused on the tasks of video synthesis and video prediction, extending the powerful functions and realistic effects of the generative image model to the video field.
DVD-GAN: Based on the BigGAN model structure
The researchers built a DVD-GAN system based on the BigGAN model structure and introduced a series of adjustments for video generation, enabling DVD-GAN to be trained on Kinetics-600.
Kinetics-600 is a training dataset compiled from 500,000 10-second high-resolution YouTube video clips, originally made for recognizing human actions, and is an order of magnitude larger than other commonly used corpora.
At the same time, the researchers said Kinetics-600's diverse features can eliminate their concerns about overfitting, which mainly refers to the phenomenon that the model selected in machine learning contains too many parameters, resulting in the model predicting known data well but predicting unknown data poorly.
On the other hand, DeepMind researchers use generative adversarial learning to provide a learning signal that can generate actions.
In addition, DVD-GAN has a separate Transformer module that allows learning information to propagate within the integrated AI model.
Training takes 12 to 96 hours to generate videos
The research paper shows that after training for 12 to 96 hours on Google's third-generation TPU, DVD-GAN can successfully generate videos that contain the composition and movement of objects, as well as various complex textures.
The downside is that the video content generated by DVD-GAN is sometimes rather "weird", for example, the generated objects and human figures are strange in shape, and even the human body changes in length.
But the researchers noted that when DVD-GAN was evaluated on UCF-101, a smaller dataset of 13,320 human action videos, the highest initialization score for samples generated by DVD-GAN was 32.97.
DeepMind researchers hope to further highlight the benefits of training generative models on large and complex video datasets, such as Kinetics-600.
“We envision DVD-GAN establishing a strong baseline on this dataset that will be used as a reference point for future generative modeling efforts,” the researchers said. “While much work remains to be done to consistently generate realistic videos in unconstrained settings, we believe DVD-GAN is an important step in that direction.”
A Generative Adversarial Network (GANs) is used to distinguish generated samples from real-world samples. The network mainly consists of two parts: a generator and a discriminator.
GANs have been used in tasks such as converting text into scenes or generating artificial galaxy images. The researchers used a generative adversarial network called BigGANs, which is named for its large batch size and millions of parameters.
It is worth mentioning that DVD-GAN contains two discriminators. One is the spatial discriminator (Spatial Discriminator: D_S), which evaluates the content and structure of a single frame by randomly sampling full-resolution frames and processing them separately; the other is the temporal discriminator (Temporal Discriminator: D_T), which can provide a learning signal that can generate actions.
In addition, DVD-GAN has a separate Transformer module that allows learning information to propagate within the integrated AI model.
Conclusion: An attempt to generate highly realistic videos using AI
Whether it is BigGAN or FaceApp, researchers in the past have conducted many groundbreaking studies in the field of artificial intelligence generated images, but in the field of video, apart from the fact that AI face-changing was once popular, there have not been many more breakthroughs.
DVD-GAN, developed by DeepMind researchers based on the BigGAN architecture and the Kinetics-600 training dataset, uses computationally efficient discriminator decomposition to extend to longer and higher resolution videos. For now, although this achievement is still a little bit imperfect, it is undoubtedly an important attempt by researchers to use AI to generate highly realistic videos.
Previous article:Analysis on the Prospect of Fingerprint Recognition Technology
Next article:WPG World Peace Edge Computing Face Recognition Solution
- Popular Resources
- Popular amplifiers
- e-Network Community and NXP launch Smart Space Building Automation Challenge
- The Internet of Things helps electric vehicle charging facilities move into the future
- Nordic Semiconductor Launches nRF54L15, nRF54L10 and nRF54L05 Next Generation Wireless SoCs
- Face detection based on camera capture video in OPENCV - Mir NXP i.MX93 development board
- The UK tests drones equipped with nervous systems: no need to frequently land for inspection
- The power of ultra-wideband: reshaping the automotive, mobile and industrial IoT experience
- STMicroelectronics launches highly adaptable and easy-to-connect dual-radio IoT module for metering and asset tracking applications
- This year, the number of IoT connections in my country is expected to exceed 3 billion
- Infineon Technologies SECORA™ Pay Bio Enhances Convenience and Trust in Contactless Biometric Payments
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- Learn MSP430F5529 programming routines
- Diode as a temperature compensation circuit for transistors
- TL431 as a voltage regulator
- Is the product on Taobao claiming to be an energy saver genuine?
- Questions about DC Boost Circuit
- The most touching thing in the world is the distant similarity
- Using FPGA to realize accurate time keeping when GPS is out of step
- Design of Phase Detection Broadband Frequency Measurement System Based on FPGA
- Maxim's MAX15066 high-efficiency DC-DC solution
- Frequency converter, inverter circuit