Save thousands of students from dire straits! Facebook open-sources non-gradient optimization tool

Latest update time：2018-12-22

Reads：

Qian Ming from Aofei Temple
Produced by Quantum Bit | Public Account QbitAI

What is the most difficult part of machine learning? Finding parameters is probably the most difficult part!

Many graduate students are stuck at this stage and cannot graduate for a long time.

Now, before Christmas, there is good news!

Facebook announced that it has open-sourced its own gradient-free optimization tool: Nevergrad .

And they "solemnly" stated that this would make the process of adjusting model parameters and hyperparameters faster and easier.

But on Twitter, it seemed to have the opposite effect.

After seeing this news, some people immediately had a stalk in their mind:

Never Graduate?

Some people also joked that this was definitely a nightmare for those who have studied for a PhD for six years.

However, joking aside, this is the real NeverGradient.

Most people still expressed their attitudes by forwarding and liking.

What is this?

In short, this is a Python 3 library that contains many algorithms that do not require gradient calculations. These algorithms are:

Differential Evolution
Sequential Quadratic Programming
FastGA
Covariance matrix adaptation
Population control methods for noise management
Particle Swarm Optimization
…

They are all presented in a standard ask-and-tell Python framework, and Facebook is also equipped with relevant testing and evaluation tools.

Do you feel like tears welling up in your eyes?

Don’t be impatient yet… let’s see how it works.

Just pick it up and use it

Let’s start with the algorithms mentioned above. Previously, these algorithms were all customized. If you want to compare the performance of various algorithms in a task, it’s either impossible or requires a lot of effort.

Facebook said that with Nevergrad, these are not problems. As long as it is needed, you can pick it up and use it.

Not only can you compare the performance of different methods, but you can also compare it with the state-of-the-art on common benchmarks, helping you find the best optimization method for your specific use case.

The application scenario is very touching

Let’s talk about Facebook first. The blog post said that its research team has used Nevergrad in reinforcement learning, image generation, and various other projects.

Moreover, Nevergrad's gradient-free optimization can be widely used in various machine learning problems. For example:

Multimodal problems, such as those with several minima. (For example, hyperparameterization of deep learning for language modeling.)
Ill-conditioned problems, which usually arise when trying to optimize several variables with very different dynamics. (For example, dropout and learning rates are not recalibrated for the specific problem.)
Separable or rotatable issues, including partially rotatable issues.
Partially separable problems can be solved by considering several blocks of variables. Examples include architecture search for deep learning or other forms of design, and parameterization of multi-task networks.
Discrete, continuous, or mixed problems. These tasks require choosing the learning rate of each layer, the weight decay of each layer, and the type of nonlinearity of each layer.
There is the problem of noise, where a function can return different results when called with exactly the same arguments, such as independent events in reinforcement learning.

Come, let’s summarize.

In machine learning, Nevergrad can be used to tune hyperparameters such as learning rate, momentum, weight decay (possibly per layer), dropout, and layer parameters for each part of a deep network.

But from the perspective of gradient-free methods, it can also be applied to power grid management, aviation, lens design, and many other sciences and engineering.

To demonstrate the capabilities of Nevergrad, the Facebook team implemented several benchmarks using Nevergrad.

Hardcore Example: Generating Algorithm Benchmarks with Nevergrad

Different examples correspond to different settings (multimodal or not, noisy or not, discrete or not, ill-conditioned or not), and show how to use Nevergrad to determine the best optimization algorithm.

In each benchmark, they performed independent experiments for different values of x. This ensured that consistent rankings between methods at several values of x were statistically significant.

△ Noise optimization example

This example shows that the TBPSA noise management approach using pcCMSA-ES outperforms several other alternatives.

For the specific comparison, Facebook has open-sourced it on GitHub, and the link is at the end of the article.

Nevergrad can also handle discrete objective functions, which is a problem encountered in many machine learning cases.

For example, choosing between a limited set of options (such as activation functions in a neural network), and choosing between different types of layers (e.g., deciding whether a skip connection is needed at a certain point in the network).

Some existing tools, such as Bbob and Cutest, do not include any discrete benchmarks. But Nevergrad can handle discrete domains.

There are two methods: one is through the softmax function (converting discrete problems into noisy continuous problems), and the other is through the discretization of continuous variables.

Facebook also conducted a special test.

As shown in the figure above, FastGA performs best in this case in the test. One thing is that DoubleFastGA corresponds to a mutation rate between 1/dim and (dim - 1) / dim, rather than 1/dim and 1/2. This is because the original range corresponds to a binary domain, while here Facebook considers an arbitrary domain.

Okay, that’s all I have to say.

Please close the portal.

Portal

Nevergrad project address:

https://github.com/facebookresearch/nevergrad

blog address:

https://code.fb.com/ai-research/nevergrad/

Noise optimization example project address:

https://github.com/facebookresearch/nevergrad/blob/master/docs/benchmarks.md

The author is a contracted author of NetEase News and NetEase "Each has its own attitude"

-over-

Join the community

The QuantumBit AI community has started recruiting. Students who are interested in AI are welcome to reply to the keyword "communication group" in the dialogue interface of the QuantumBit public account (QbitAI) to obtain the way to join the group;

In addition, professional qubit sub-groups (autonomous driving, CV, NLP, machine learning, etc.) are recruiting for engineers and researchers working in related fields.

To join the professional group, please reply to the keyword "professional group" in the dialogue interface of the Quantum Bit public account (QbitAI) to obtain the entry method. (The professional group has strict review, please understand)

Sincere recruitment

Qbit is recruiting editors/reporters, and the work location is Beijing Zhongguancun. We look forward to talented and enthusiastic students to join us! For relevant details, please reply to the word "recruitment" in the dialogue interface of the Qbit public account (QbitAI).