Article count:10350 Read by:146647018

Account Entry

GPT-4o mini tops the large model arena, Ultraman: fine-tuning within two months is free

Latest update time:2024-07-24
    Reads:
Cressy from Aofei Temple
Quantum Bit | Public Account QbitAI

Just now, the GPT-4o mini version ushered in its "highlight moment"——

It topped the lmsys large model arena , tied for first place with the full-blooded version, and left Claude 3.5 behind.

Different from general data set evaluation, the big model arena is the result of users setting questions themselves and voting with their feet. There is no shortcut by "brushing questions", so it is more realistic.

When this result came out, even CEO Altman was excited:

We tried to be as reserved as possible when it came to the evaluation results, but we were very excited to see that the performance of the GPT-4o mini was the same as the full-powered version but at only 1/20 of the price.

After seeing this, netizens said it was OK, but they were more concerned about when "Her" demonstrated at the GPT-4o press conference would be launched.

At the same time, OpenAI also brought another good news, which will provide benefits to developers——

Fine-tuning of GPT-4o mini will be gradually opened up , currently open to tier 4 and tier 5 users, and then the scope will be gradually expanded.

Moreover, from now until September 23, 2 million training tokens can be used for free every day .

Mini is on par with the full-blooded version

After millions of rounds of 1v1 competition among more than 80 models, the GPT-4o mini’s score on the LMSYS list was only 7 points behind the full-blooded version.

According to the lmsys ranking, the 7-point difference did not affect the ranking, and the two models were counted as tied for first place.

It was followed by Claude 3.5 and the Gemini family, and two more versions of GPT-4.

If we look at the raw data of GPT-4o mini, we will find that its average win rate of 0.6 is second only to the full-blooded version.

Looking at the results of the competition between the two, they are equally matched.

The reason why lmsys's performance has attracted attention is that it has a unique way of competing:

Instead of using a dataset, users are asked to come up with their own questions, randomly select two models for a 1-on-1 battle , and then choose the model that performs better.

Before giving a choice, the model is anonymous and the user does not know which two models are competing. If the model lets it slip, the vote will be invalid.

The scores obtained in this way are more realistic, avoiding the possibility of obtaining inflated scores by "brushing questions" and are also closer to user experience.

This big model arena has recently been featured in ICML2024, the top machine learning conference .

Moreover, lmsys's evaluation is also very popular with OpenAI . The early version of GPT-4o mini before its official launch was listed in the list under the pseudonym gpt-mini.

It was already ranked 4th at the time, on the same level as GPT4-Turbo.

Earlier, before GPT-4o went online, it was also known as gpt2-chatbot and was tested on lmsys.

However, some people have raised doubts, saying that although the performance of GPT-4o mini is indeed very good, it is a bit of an exaggeration to say that it surpasses the Claude 3.5 sonnet.

Some even bluntly stated that the integrity of the lmsys method has begun to collapse and changes need to be made, otherwise it will no longer be a useful test benchmark.

The "small model" is also rolled up

The launch of the mini version focuses on cost-effectiveness.

For every million input/output tokens, the price is 15 cents and 60 cents respectively (about 1.09/4.36 RMB), which is less than half of 3.5 Turbo.

Compared with the text-davinci-003 version of GPT-3 two years ago (the best model at the time) , the price has dropped by 99%.

In addition to opening up small models to users, OpenAI has also come up with a new way to play -

In a posthumous work by the "Super Alignment" team , a small model with one thousandth or one hundredth of the parameters of the large model was used to optimize the large model.

In the experiment, the two models, large and small, "play" against each other. The large model needs to continuously optimize and adjust its output to make the small model believe that it is telling the truth.

During this "game", the capabilities of the large model have been improved, and comprehensibility has been greatly improved without a significant loss of accuracy.

In addition to OpenAI, other companies have also started to develop small models.

For example, before GPT-4o mini, Google and Anthropic launched Gemini Flash and Claude 3-Haiku respectively.

It can even be said that GPT-4o mini is OpenAI's counterattack against the two models, surpassing these two models in both performance and price.

In the same week that GPT-4o mini was released, Hugging Face and "European OpenAI" Mistral both launched small models.

Even Apple launched its own 7B model and open-sourced the entire training process and resources at one time.

In short, as long as the performance is sufficient to meet the usage needs, the small model is undoubtedly a more economical choice.

At the same time, a smaller scale also means that it is possible to run on the end side, showing advantages in terms of privacy protection and other aspects.

It is not difficult to understand why the “small” models are becoming more and more popular.

Reference links:
[1]
https://x.com/sama/status/1815877987696533897/
[2] https://x.com/OpenAIDevs/status/1815836887631946015

-over-

QuantumBit's annual AI theme planning Now soliciting!

Welcome to submit your contributions to the special topic 1,001 AI applications , 365 AI implementation solutions

Or share with us the AI ​​products you are looking for or the new AI trends you have discovered


Click here ???? Follow me, remember to mark the star~

One-click triple click "Share", "Like" and "Watching"

Advances in science and technology are happening every day ~


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号