Mathematics ability exceeds ChatGPT, 70B open source large model is popular: Use AI to fine-tune AI, produced by Microsoft All-China Class

Latest update time：2023-08-14 15:24

Reads：

Fengse comes from Ao Fei Temple
Qubit | Official account QbitAI

Use AI-generated instructions to fine-tune a large alpaca model, with mathematical capabilities exceeding ChatGPT——

Microsoft's latest open source large model WizardMath is here.

As shown in the figure below, after testing on the GSM8k data set, WizardMath’s mathematical capabilities directly defeated many large models such as ChatGPT, Claude Instant 1, and PaLM 2-540B——

And this is under the circumstance that the parameters are only 70 billion, which is far less than the latter three .

HuggingFace has launched 3 online playable versions (respectively with 7B, 13B and 70B parameters) . You can directly throw various math problems into it and give it a try.

For example, solve the following fourth-order polynomial equation:

Or a simple calculus:

Or a slightly modified derivation of the Lagrange equation:

It's all correct (and the process doesn't take too long) .

Some netizens expressed to the author:

The effect is really amazing, thank you for your contribution to open source LLM.

At present, the relevant code, reproduction methods and papers are also open source or online, and GitHub has received 4.8k stars in just a few days .

So, how does WizardMath do it?

Enhance large model capabilities with AI-generated instructions

OpenAI's large models (InstructGPT, GPT-4, etc.) are able to perform a variety of complex and diverse tasks with great success, in part because they are fine-tuned using open-domain instruction data generated by real human users.

However, not everyone has access to such a command data set like this company.

One is because the entire annotation process is extremely expensive and time-consuming, and the other is that it is difficult to manually create a sufficient proportion of difficult instructions.

Therefore, developing a relatively low-cost, large-scale automatic production method of open-domain instructions has become the key to the current instruction tuning language model.

Here, the authors named their method Evol Instruction .

It is a new method that uses AI to replace humans to automatically generate open-domain instructions covering various difficulty levels.

Specifically, Evol Instruction is divided into instruction evolver and instruction eliminator .

The instruction evolver can upgrade simple instructions to more complex instructions or create a new instruction through two paths: depth evolution (blue line) or breadth evolution (red line) .

Which one should be implemented specifically? Just choose randomly.

Among them, the specific "evolution method" of deep evolution is completed through five types of operations, including:

Add constraints , deepening , concretizing , increase reasoning steps , and complicate input .

Since all instructions are completed by AI, sometimes errors are inevitable. Therefore, the instruction eliminator is used to filter failed instructions.

Here's a concrete example of a method that starts with "1+1=?" and ends up automatically generating quite a few new instructions through the above steps.

By repeating this generation process, we will eventually get enough instructions, and then merge them and randomly shuffle them to form an instruction set with a uniformly distributed difficulty level , and then we can fine-tune the basic large model.

Here, the author selects Alpaca's training data (generated from only 175 artificially created seed instructions) as the initial data set, and then uses ChatGPT's API to perform four evolution cycles, finally obtaining 250,000 instructions.

In order to make a fair comparison with Vicuna's 70k real user data (ShareGPT) , the author extracted an equal amount of samples from these 250,000 pieces of data, trained the LLaMA 7B model, and finally obtained WizardLM. As a result, the performance of WizardLM was significantly better than Vicuna.

(Alpaca: A model fine-tuned by Stanford on the basis of LLaMa-7B; Vicuna, a model fine-tuned by UC Berkeley on the basis of LLaMa-13B)

In addition, under more complex test instructions, humans prefer the output of WizardLM to ChatGPT, indicating that this method can significantly improve LLM's ability to handle complex instructions.

Based on this, the author used Evol Instruction to generate many instructions related to the mathematical field, and then fine-tuned the alpaca large model to obtain WizardMath .

The effect is shown at the beginning. Its mathematical ability measured on the GSM8k data set surpasses a number of large models including ChatGPT, Claude Instant 1, PaLM 2-540B, etc., ranking 5th, second only to GPT-4 and Claud1. 3 and 2.0, and after Flan-PaLM 2 with 540 billion parameters.

By analogy, the author also obtained WizardCoder , which specializes in coding capabilities, on top of Alpaca , and its effect exceeds that of Claude and Bard (for details, please click on the address at the end of the article) .

team introduction

There are 9 authors in this article, all of whom are Chinese.

There are 3 people in one work:

Can Xu is a senior application scientist in the S+D NLP group of Microsoft Asia Internet Engineering Institute. He previously worked on chat robot systems in Microsoft Xiaoice Research Group and Microsoft Asia Research Institute;

Qingfeng Sun , Microsoft Research scientist, researches in natural language processing and information retrieval, is proficient in building efficient search systems, and has contributed core deep models to Microsoft Bing and Office 365;

Kai Zheng , Microsoft Research scientist, researches in natural language processing, search and recommendation ranking. He also contributed core deep models to Microsoft Bing and Office 365.

The corresponding author is Jiang Daxin , Microsoft global partner, vice president, and former chief scientist of Microsoft Research Asia. He has worked at Microsoft for more than 16 years and was the person in charge of natural language understanding for Microsoft's Bing search engine and Cortana intelligent assistant. It was recently revealed that he has resigned to join the company Big model entrepreneurship .

另还有一位作者Jiazhan Feng，是北大学生，这篇合著论文是TA在微软实习时产出的。

Project homepage: https://github.com/nlpxucan/WizardLM/tree/main/WizardMath

Paper address:
https://arxiv.org/abs/2304.12244 (WizardLM)
https://arxiv.org/abs/2306.08568 (WizardCoder)

-over-

"AIGC+Vertical Field Community"

Recruiting!

Partners who follow AIGC are welcome to join the AIGC+ vertical community and learn, explore and innovate AIGC together!

Please note the vertical field "education" or "advertising marketing" you want to join. To join the AIGC talent community, please note "talent" & "name-company-position".

Click here ???? Follow me and remember to star~