A slap in the face to Ultraman, GPT-4 is lazier this year than last year! Netizens tested online
Cressy comes from Aofei Temple
Qubits | Public account QbitAI
There is new progress in the problem of GPT-4 becoming lazy.
Just early this morning, Ultraman tweeted that the problem of GPT-4 should be much better in the new year!
Regarding GPT-4 becoming lazy, netizens have complained countless times, and the most common ones are code-related tasks:
Not to mention that the degree of completion is not high, it will also be divided into small pieces, and you need to copy them one by one when using them.
Regarding the latest version, a blogger said after experiencing it that he tried to make a small learning game for first-grade children, and the effect was pretty good.
But some people disagree. For example, this netizen found that although the length of ChatGPT replies has increased, many of them are just chatter, and they are still messed up when doing business.
He asked ChatGPT to translate some text into 17 languages, but Jiliguala talked a lot but did not translate it.
In order to eliminate individual differences, some netizens tested the new ChatGPT with the data set, and the results...
Is the new version even lazier?
This netizen used a set of "lazy benchmark" open source on GitHub to test the 0125 (latest version in January 24) and 1106 (previous version in November 23) GPT-4 models, and found that the new version is not even as good as before. Even lazier.
This test data set contains code-related tasks, and the proportion of correct completions indirectly reflects the degree of "laziness". The higher the completion rate, the smaller the "laziness".
As a result, for the code comparison (Unified diffs) task, the old version was able to complete more than half of it, at 57%, but the completion rate of the new version was only 44%, which was nearly a quarter lower.
Intuitively, some people also found that ChatGPT's "laziness" has become worse——
In the past, even if I was lazy, I would at least fool around with it and give a rough framework for users to supplement on their own. Now, I just pretend that I can't do it.
In response to this discovery by netizens, some people also gave sharp comments:
Ultraman said a few weeks ago that the performance of GPT-4 has improved, but does anyone feel the difference?
This time, Ultraman did not elaborate further on the reasons why GPT-4 became lazy and what optimization strategies were adopted.
“Original methods” can reduce inertia
However, a previous study showed that GPT-4's inertia may be time-related, and this conclusion is consistent with the phenomenon that GPT-4 "becomes lazy" in December at the end of the year.
According to this theory, the performance of the model will indeed improve at the beginning of the new year, but it does not seem to explain the phenomenon of performance falling instead of rising.
However, netizens have also summarized some "local methods" that can reduce the inertia of ChatGPT to a certain extent.
For example, if you tell it "I have no fingers", you can get a relatively complete code instead of fragments.
Or, telling ChatGPT that you will "tip" can also motivate it to work.
Some people even conducted research on the amount of "tip" and found that $10 is the most cost-effective.
So, do you think ChatGPT has become better or lazier?
Reference links:
[1]
https://twitter.com/sama/status/1754172149378810118
[2]
https://aider.chat/docs/benchmarks-0125.html
-over-
Click here ???? Follow me and remember to star~