A perfect win over ReLU! Stanford's neural network uses this activation function to restore various images and videos with high fidelity

Latest update time：2021-08-31 20:34

Reads：

Xiaoxiao sent from Aofei Temple
Quantum Bit Report | Public Account QbitAI

A simple neural network structure called SIREN , developed by a Stanford team, has just set off a wave of machine learning buzz.

For no other reason than the amazing audio/video and image restoration effects it displays (in the effect display, Ground Truth is the original video, audio or image data) .

The video restores the movements of a cat. It can be seen that the cat reconstructed using SIREN has obviously higher clarity.

As can be seen from the picture, SIREN's reproduction of audio can almost be described as "completely consistent".

SIREN is quite efficient in restoring images. As can be seen in the video, SIREN has restored the image to a high degree with very few iterations.

Why does this neural network architecture perform so well?

Continue reading below.

Store image data in a continuous manner

In the sampling process, the data we obtain is often discrete , which means that when restoring the image, we need to represent the image through interpolation .

Not only that, storing large amounts of raw data in discrete form also requires a certain amount of space.

SIREN has made a breakthrough in this direction by using the periodic activation function Sine to replace common nonlinear activation functions (such as ReLU, TanH, etc.) to store data in a continuous manner.

Compared with non-periodic activation functions such as ReLU and TanH, SIREN uses sinusoidal periodic functions as activation functions, which is equivalent to introducing periodicity into the neural network .

Since the periodic sinusoidal activation function is differentiable everywhere, it can quickly adapt to complex signals, parameterize the natural image space well, and model more precisely.

This means not only that the function can represent the image in a continuous manner, but also that it can represent it at any resolution without losing any information .

This is true not only for images, but also for 3D models. So, can we repair buildings in 3D?

△ The modeling effect is like the comparison between a finished house and a rough house

Even if the original data is clearly discrete, storing the data in a continuous form has the benefit of not having to worry about sampling issues.

The impact of this on data compression and image restoration research cannot be ignored.

Gradient-based supervised learning

Have you ever thought about doing supervised learning of derivatives of functions in neural networks?

SIREN did just that.

The reason it can do this is that the derivative of the sinusoidal activation function it uses is still periodic (cosine function) , that is, the derivative of SIREN is still SIREN.

Therefore, the derivative of SIREN inherits the properties of SIREN. This enables us to use complex signals to supervise the derivatives of SIREN of any order when performing supervised learning.

The figure below shows the effect of SIREN using gradients or Laplacians (the green underline indicates the supervision method used) for supervised learning of starfish images.

Compared with the true value on the left, the reconstruction effects of these two methods are good. The middle image is a reconstruction of the original image using gradient supervision, while the image on the right is a reconstruction of the image derivative using Laplacians supervision.

The results show that when using derivative supervision, SIREN still performs well, which is very effective for solving boundary value problems (BVP).

Not only that, SIREN converges faster than other structures, and often only takes a few seconds on the GPU to achieve high-fidelity image reconstruction effects.

Is it a breakthrough innovation or is it limited?

Don't forget that the periodic sinusoidal activation function is used at the basis of the implicit representation of neural networks.

Implicit representation is the opposite of explicit representation. In the latter, the expression of the function is expressed only by independent variables, while in the former, it is impossible to distinguish the function from the independent variables well.

Example of implicit representation: f(x) = [f(x)]^2 + x, the expression still contains f(x)

Example of explicit representation: f(x) = x + 2

Compared with explicit neural representation, research on implicit neural representation has gradually emerged in recent years. This function can express richer and more diverse relationships, but at the same time there is also the problem of insufficient modeling.

The research of the Stanford team can be said to be a breakthrough in the field of implicit neural representation. It adopted a periodic activation function and an appropriate initialization scheme and achieved good results.

When the research results first came out, many netizens commented on the research on Twitter, most of whom expressed amazement .

This netizen is obviously very enthusiastic about the research results.

Shocking! A must read! No time to explain, get in the car! This is Vincent's unparalleled work!

Some netizens have begun to reconsider the status of ReLU in today's neural networks.

Is this periodic sine activation function the new “ReLU”?

Some netizens also said that compared with the research results of the entire paper, the idea of using sinusoidal activation functions for neural networks obviously has a greater impact on the field of machine learning.

The idea of using sinusoidal activations for neural networks may have had a greater impact on machine learning than this poorly written paper.

Some netizens also believe that this study still has limitations.

I was amazed at the results that neural networks with sinusoidal activations showed, especially in that they can represent images and videos very accurately compared to the old ReLU. However, representing neural PDEs in this way still doesn't work very well compared to SOTA AFAIK.

On Reddit, a netizen raised his doubts mercilessly after carefully reading the paper .

I think there are many inexplicable loopholes in the paper, which greatly reduce the credibility of the conclusions... (cite 6 questions)
My opinion is that although this paper brought me a sense of novelty, the author did not actually put too much thought into proving the paper's views and judging the practical value of the research results.

Some netizens immediately expressed their agreement.

These image reconstruction papers that use MNIST or CelebA datasets as test results are very misleading . There is no example to prove that these neural network algorithms can be used for image processing in real life.

From this point of view, the practical application value of this research may require more consideration.

At present, some professional netizens have conducted a detailed analysis of the paper and introduced the core content of the paper in a clear and easy-to-understand manner in less than an hour.

The timeline of the paper analysis is as follows. If you are interested in any part of it, you can check it out~ (The video link is at the end of the article)

0:00 - Overview2
:15 - Implicit Neural Representation9
:40 - Image Example14
:30 - SIREN Network18
:05 - Initialization Scheme20
:15 - SIREN Derivatives23
:05 - Poisson Image Reconstruction28
:20 - Poisson Image Editing31
:35 - Signed Distance Function (SDF)
45:55 - Research Website48
:55 - Other Applications50
:45 - Hypernetworks in SIREN54
:30 - Broad Impact

research team

△ Vincent Sitzmann

The main author, Vincent Sitzmann, is a recent PhD graduate from Stanford University and is currently a postdoctoral student at MIT. His main research areas include neural scene representation, computer vision, and deep learning.

This is a research team with PhD-level members, and they have done in-depth research in computer vision.

As computer vision becomes increasingly advanced, the industry hopes that machines will achieve much more than just “simple two-dimensional reproduction of images like a camera”, but rather have visual perception capabilities like humans.

Portal

Paper link:
https://arxiv.org/pdf/2006.09661.pdf

Project introduction
https://vsitzmann.github.io/siren/

Paper analysis
https://www.youtube.com/watch?v=Q5g3p9Zwjrk&feature=youtu.be

The author is a contracted author of NetEase News and NetEase "Each has its own attitude"

-over-

The "Database" series of open courses is now open, come and sign up for free!

In the second live broadcast on June 23, Qiao Xin, general manager of the database product line of Inspur Information , shared "Data platform upgrade under the traditional enterprise Internet" and talked about technical issues such as the technical principles, optimization solutions and development and deployment outline of HTAP database, so as to provide some forward-looking guidance to the wide audience.

Scan the QR code to sign up and join the live exchange group. You can also get the live replay of the series of courses and share PPT: