A slap in the face to Ultraman, GPT-4 is lazier this year than last year! Netizens tested online

Latest update time：2024-02-05

Reads：

Cressy comes from Aofei Temple
Qubits | Public account QbitAI

There is new progress in the problem of GPT-4 becoming lazy.

Just early this morning, Ultraman tweeted that the problem of GPT-4 should be much better in the new year!

Regarding GPT-4 becoming lazy, netizens have complained countless times, and the most common ones are code-related tasks:

Not to mention that the degree of completion is not high, it will also be divided into small pieces, and you need to copy them one by one when using them.

Regarding the latest version, a blogger said after experiencing it that he tried to make a small learning game for first-grade children, and the effect was pretty good.

But some people disagree. For example, this netizen found that although the length of ChatGPT replies has increased, many of them are just chatter, and they are still messed up when doing business.

He asked ChatGPT to translate some text into 17 languages, but Jiliguala talked a lot but did not translate it.

In order to eliminate individual differences, some netizens tested the new ChatGPT with the data set, and the results...

Is the new version even lazier?

This netizen used a set of "lazy benchmark" open source on GitHub to test the 0125 (latest version in January 24) and 1106 (previous version in November 23) GPT-4 models, and found that the new version is not even as good as before. Even lazier.

This test data set contains code-related tasks, and the proportion of correct completions indirectly reflects the degree of "laziness". The higher the completion rate, the smaller the "laziness".

As a result, for the code comparison (Unified diffs) task, the old version was able to complete more than half of it, at 57%, but the completion rate of the new version was only 44%, which was nearly a quarter lower.

Intuitively, some people also found that ChatGPT's "laziness" has become worse——

In the past, even if I was lazy, I would at least fool around with it and give a rough framework for users to supplement on their own. Now, I just pretend that I can't do it.

In response to this discovery by netizens, some people also gave sharp comments:

Ultraman said a few weeks ago that the performance of GPT-4 has improved, but does anyone feel the difference?

This time, Ultraman did not elaborate further on the reasons why GPT-4 became lazy and what optimization strategies were adopted.

“Original methods” can reduce inertia

However, a previous study showed that GPT-4's inertia may be time-related, and this conclusion is consistent with the phenomenon that GPT-4 "becomes lazy" in December at the end of the year.

According to this theory, the performance of the model will indeed improve at the beginning of the new year, but it does not seem to explain the phenomenon of performance falling instead of rising.

However, netizens have also summarized some "local methods" that can reduce the inertia of ChatGPT to a certain extent.

For example, if you tell it "I have no fingers", you can get a relatively complete code instead of fragments.

Or, telling ChatGPT that you will "tip" can also motivate it to work.

Some people even conducted research on the amount of "tip" and found that $10 is the most cost-effective.

So, do you think ChatGPT has become better or lazier?

Reference links:
[1] https://twitter.com/sama/status/1754172149378810118
[2] https://aider.chat/docs/benchmarks-0125.html

-over-

Click here ???? Follow me and remember to star~

Three consecutive clicks of "Share", "Like" and "Watching"

Advances in cutting-edge science and technology are seen every day ~

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research

A slap in the face to Ultraman, GPT-4 is lazier this year than last year! Netizens tested online

Cressy comes from Aofei Temple Qubits | Public account QbitAI

Is the new version even lazier?

“Original methods” can reduce inertia

Latest articles about

Cressy comes from Aofei Temple
Qubits | Public account QbitAI