Small model challenges large model with 14 times more parameters, Google launches new scaling law on test-time side

Latest update time：2024-09-11

Reads：

The west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI

Without increasing model parameters, with the same computing resources, the small model outperforms the model 14 times larger !

Google DeepMind's latest research has sparked heated discussions, and some even said that this may be the method used by OpenAI's upcoming new model Strawberry.

The research team explored methods to optimize computation when reasoning with large models, dynamically allocating test-time computing resources based on a given prompt difficulty .

It was found that this method is more cost-effective than simply expanding model parameters in some cases .

In other words, it may be a better strategy to spend less computational resources in the pre-training phase and more in the inference phase.

Using extra computation to improve output at inference time

The core question of this study is -

Different computational strategies have significantly different effectiveness for different problems when solving the prompt problem within a certain computational budget. How should we evaluate and choose the test-time computational strategy that is most suitable for the problem at hand? How does this strategy compare to just using a larger pre-trained model?

The DeepMind research team explored two main mechanisms to expand computation at test time.

One is to search for a dense process-based validator reward model (PRM) .

PRM can provide scores at each step in the process of model generation of answers to guide the search algorithm, dynamically adjust the search strategy, and help avoid wasting computing resources on these paths by identifying incorrect or inefficient paths during the generation process.

Another approach is to adaptively update the model's response distribution based on the prompt at test time .

Instead of generating a final answer all at once, the model gradually modifies and improves the answers it previously generated, revising them sequentially.

The following is a comparison between parallel sampling and sequential revision. Parallel sampling generates N answers independently, while sequential revision is that each answer depends on the result of the previous generation and is revised step by step.

By studying these two strategies, the team found that the effectiveness of different methods highly depends on the difficulty of the prompt.

Therefore, the team proposed a "computationally optimal" expansion strategy to adaptively allocate computing resources during testing according to prompt difficulty .

They classified the problems into five levels of difficulty and selected the best strategy for each level.

As shown in the left figure below, in the revision scenario, the gap between the standard best-of-N approach (generating multiple answers and selecting the best one) and the computationally optimal extension gradually widens, allowing the computationally optimal extension to surpass the best-of-N approach while using 4 times less test computing resources.

Similarly, in the PRM search environment, the computationally optimal extension shows significant improvements over best-of-N in the early stages, and in some cases even approaches or exceeds the performance of best-of-N with 4 times less computing resources.

The right side of the figure above compares the performance of the PaLM 2-S model with compute-optimal scaling during the test phase and a pre-trained model that uses no additional test compute, the latter being a 14x larger pre-trained model .

The researchers considered pre-training with ???? tokens and inference with ???? tokens, both of which are expected in both models. We can see that in the revised scenario (top right) , when ???? << ????, computation at test time generally outperforms additional pre-training.

However, as the ratio of inference to pre-training tokens increases, test-phase computation is still preferred on easy problems, while on harder problems, pre-training is superior in these cases. The researchers also observed a similar trend in the PRM search scenario.

The study also compared the effects of test-time computation with that of adding pre-training. When the amount of computation was matched, for easy and medium-difficulty problems, additional test-time computation generally outperformed additional pre-training.

For more difficult problems, it is more effective to add pre-training calculations.

Overall, the study reveals that current test-time computation expansion methods may not be able to completely replace pre-training expansion, but have shown advantages in some cases.

Aroused heated discussion among netizens

After this research was posted by netizens, it sparked heated discussion.

Some netizens even said that this explains the reasoning method of OpenAI's "Strawberry" model.

Why do you say that?

It turned out that just last night at midnight, foreign media The Information released news that OpenAI's new model Strawberry is scheduled to be released within the next two weeks. Its reasoning ability has been greatly improved, and users do not need additional prompts for input.

Strawberry does not blindly pursue the Scaling Law. The biggest difference from other models is that it will "think" before answering.

So it takes 10-20 seconds for Strawberry to respond .

This netizen speculated that Strawberry may have used a method similar to that of Google DeepMind's study (doge):

If you disagree, explain with an alternative line of reasoning!

Explanation is explanation:

This article explores best-of-n sampling and Monte Carlo Tree Search (MCTS) .

Strawberry might be a hybrid deep model with special tokens (e.g. backtracking, planning, etc.) It might be trained with human data annotators and reinforcement learning from easily validated domains (e.g. math/programming) .

Paper link: https://arxiv.org/pdf/2408.03314

Reference links:
[1] https://x.com/deedydas/status/1833539735853449360
[2] https://x.com/rohanpaul_ai/status/1833648489898594815

-over-

QuantumBit's annual AI theme planning Now soliciting!

Welcome to submit your contributions to the special topic 1,001 AI applications , 365 AI implementation solutions

Or share with us the AI products you are looking for or the new AI trends you have discovered

Click here ???? Follow me, remember to mark the star~

One-click triple click "Share", "Like" and "Watching"

Advances in science and technology are happening every day ~

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research

Small model challenges large model with 14 times more parameters, Google launches new scaling law on test-time side

The west wind blows from Aofei Temple Quantum Bit | Public Account QbitAI

Using extra computation to improve output at inference time

Aroused heated discussion among netizens

Latest articles about

The west wind blows from Aofei Temple
Quantum Bit | Public Account QbitAI