Mathematics ability exceeds that of ChatGPT! Shanghai Jiao Tong University’s large computing model tops the list of open source projects

Latest update time：2023-09-22

Reads：

Cressy comes from Aofei Temple
Qubits | Public account QbitAI

Domestic large-scale mathematical model, the ability has exceeded ChatGPT!

In the latest list, Abel’s proprietary large model produced by GAIR Laboratory of Shanghai Jiao Tong University:

The accuracy rate is as high as 83.6%, ranking first among open source models .

According to the team, the model is named after Norwegian mathematician Niels Abel to pay tribute to Abel's pioneering work in algebra and analysis.

On the GSM8k data set, Abel with 70B parameters crushed all open source models and even surpassed ChatGPT.

Even on the new data set TALSCQ-EN, Abel's performance is stronger than GPT-4.

The ingredients of Abel that achieve this effect can be said to be very "simple":

No tools used
Does not use large-scale pre-training data in the field of mathematics
No reward model used
RLHF is not used
Only use Supervised Fine-tuning (SFT)

So what is the effect of Abel?

Achievements surpassing the open source model SOTA

Here we choose Llama-2, which is also open source, to compare with Abel.

First, let’s look at a variation of the chicken-and-rabbit problem:

Brown has 60 cows and chickens in total. There are twice as many chickens as cows. How many legs are there in total?

This question Llama-2 got off to a bad start, and it was not a calculation error, but a logical problem:

Abel successfully solved this problem.

Let’s look at the next question:

What is the sum of the median and mean of 12, 21, 6, 11 and 30?

Both models correctly understood the concepts involved, but Llama still got it wrong in its calculations and ordering.

And Abel still answered this question correctly:

Let’s look at Abel’s performance from the test data.

The first is the GSM8k data set proposed by OpenAI (probably American high school difficulty). Abel accounts for three of the top ten on this list (different parameter scales).

Among the open source models, the 70B-scale Abel defeated the former SOTA-WizardMath.

If commercial closed-source models are included, Abel is second only to the most famous models such as GPT-4, Claude-2 and PaLM-2-Flan.

Even ChatGPT is no match for Abel.

△ The earth represents the open source model, and the lock represents the closed source model.

In the more difficult MATH (competition question) data set, the top three open source models were taken by Abel in three categories, and the closed source models were second only to products from Google and OpenAI.

The research team also used the new data set TALSCQ-EN to test Abel, and the results exceeded GPT-4 .

So, how did the research team train such a high-performance model?

“Nanny-level” fine-tuning training strategy

The core secret is high-quality training data.

The data used by Abel is carefully curated to not only contain the answer to the question, but also to tell the model how to find the correct answer.

To this end, the research team proposed a "nanny-level" fine-tuning training strategy called Parental Oversight.

Under the principle of parental supervision, the team completed Abel's training through SFT only.

In order to evaluate the robustness of Abel, the research team also used GPT4 to modify the numbers in GSM8k to test whether Abel can still solve the correct answer.

The results show that under the adjusted GSM8k data set, the robustness of Abel with 70B parameters exceeds that of WizardMath of the same size.

At the end of Abel’s introduction, the research team also left an Easter egg:

Abel's next generation will evolve into Bernoulli

However, the team did not explain its meaning, so we might as well look forward to it.

Team Profile

Abel is created by the GAIR (Generative Artificial Intelligence Research Group) team of Shanghai Jiao Tong University.

The team has also launched large-model college entrance examination Benchmark, AIGC fact-checking tool Factool and other results.

The leader of the team, Associate Professor Liu Pengfei of Qingyuan Research Institute, is also the leader of the Abel project.

Readers who are interested in this mathematical model can learn more about it on the GitHub page.

GitHub page:
https://github.com/GAIR-NLP/abel

-over-

"Qubit 2023 Artificial Intelligence Annual Selection" has begun!

This year, the Qubit 2023 Artificial Intelligence Annual Selection has established 5 categories of awards from the three dimensions of enterprises, people, and products/solutions! Welcome to scan the QR code to register

The most influential annual intelligent business summit MEET 2024 Intelligent Future Conference has been launched! Click here to learn more .