How bold the person is, how productive the GAN is | The whole process of the evolution of AI creative tools

Latest update time：2020-01-30

Reads：

Lai Ke from Aofei Temple
Quantum Bit Report | Public Account QbitAI

Since its birth, GAN has been upgraded all the way and its functions have become more and more powerful.

How did this powerful method evolve?

The birth and construction of GAN

GAN was born in 2014, when Ian Goodfellow and his colleagues published a paper titled Generative Adversarial Nets .

The framework of GAN was established from then on.

It consists of two parts : generators and discriminators , and runs in an unsupervised manner.

The generator grabs data and generates new synthetic samples, mixes them into the original data, and sends them to the discriminator, which distinguishes which are original data and which are synthesized later. This process is repeated until the discriminator cannot distinguish real samples from synthetic samples with an accuracy of more than 50%.

In practice, the GAN architecture also brings some defects.

First, there is the inherent instability of training the generator and discriminator simultaneously. After each parameter update, the nature of the problem to be optimized changes, so the parameter values inside the model will oscillate or become unstable. In more serious cases, the generator will crash and spit out a large number of samples that look the same.

Secondly, there is a risk that the generator and the discriminator will overwhelm each other. If the generator is too accurate, it will exploit the discriminator's weaknesses instead of fooling the discriminator by generating more realistic images; if the discriminator is too accurate, it will hinder the generator's convergence process.

Finally, the lack of training data will also affect the development of GAN in terms of semantics.

However, Hanlin Tang, senior director of Intel AI Labs, said that emerging technologies are emerging to address these limitations. He proposed two approaches. One is to put multiple discriminators into a model and fine-tune them based on specific data. The other is to feed the discriminators dense embedding representations, or numerical representations of the data . This way they can have more information to extract from them.

Application of GAN: From pictures to speech

1. Image

The most common and famous application of GAN is to synthesize realistic images.

For example, NVIDIA's Style GAN can transfer the facial features of person B to person A.

For detailed introduction, please click:

These fake faces are so realistic! Nvidia has created a new generation of GAN, which generates wallpaper-level high-definition pictures without any flaws

In addition to faces, other objects can also be transferred . Scientists at Carnegie Mellon University have developed Recycle-GAN , which can transfer the content of one video or photo to another.

For example, human faces and animated faces:

Or have one flower imitate the opening posture of another:

2. Video

One step from pictures, it is video. Deepmind developed DVD-GAN

The original dataset is 500,000 10-second high-resolution videos collected from Youtube, which can eventually generate 256 x 256 pixel videos with a maximum of 48 frames.

3. Music

In addition to creating photos, GAN can also be used to compose music.

Amazon's deepcomposer keyboard The principle is the same as GAN.

Input a simple melody, the generator creates samples based on random data, and the discriminator distinguishes them. The two are repeatedly improved, and eventually a piece of music is generated.

Click here to listen to the demo: 1 line of code can run a quantum computer! AWS's annual masterpiece: 3 more super hardware for you to choose | Dirac's grandson likes it

4. Voice

GAN is not widely used in speech. Researchers from Google and Imperial College London jointly developed GAN-TTS , a system that uses GAN to convert text into natural and realistic speech.

There are 10 discriminators in this system, some of which are responsible for determining whether the output speech and text are consistent, while others only focus on whether the speech is real and natural.

5. Detect spam comments

In order to solve the problem of people posting fake comments online using machines, researchers developed spamGAN to detect spam comments online.

spamFAN uses a technique called semi-supervised learning, where unlabeled data is used in conjunction with a small amount of labeled data.

When training with 10% of the labeled data, the accuracy reached 71% to 86%.

The future of GAN: how to finely control

Although GANs have made a lot of progress, Hanlin Tang of Intel Labs said it is still early days.

GANs still lack very fine-grained control, which is a big challenge.

On the computing side, some researchers are also trying lightweight models.

Youssef Mroueh, a researcher in IBM's Multimodal Algorithms and Engines group, and his colleagues are developing small GANs to reduce training time and memory usage.

What they are trying to achieve is that if the generator is too accurate, it will exploit the weaknesses of the discriminator instead of fooling the generator by generating more realistic pictures; if the discriminator is too accurate, it will hinder the convergence process of the generator.

If we don’t need so much computation and do so many troublesome things, how should we change the model? This is the direction they are working on now.

References:
https://venturebeat.com/2019/12/26/gan-generative-adversarial-network-explainer-ai-machine-learning/
https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

The author is a contracted author of NetEase News and NetEase "Each has its own attitude"

-over-

AI Insider | Seize new opportunities for AI development

Expand your network of high-quality contacts, obtain the latest AI information & paper tutorials, welcome to join the AI Insider Community to learn together~

Communicate with experts | Enter the AI community