O1 cornerstone papers are circulated hotly, Ilya is still the key man! Tsinghua and Peking University alumni shine in core projects

Latest update time：2024-09-16

Reads：

Bai Xiaojiao and Xi Xiaofeng sent their messages from their respective homes
Quantum Bit | Public Account QbitAI

Ever since Ilya Sutskever 's name appeared in the list of the team behind OpenAI o1, his role in o1 has become the focus of attention of many netizens.

Just now, machine learning engineer Rohan Paul posted that a paper co-authored by Ilya in May last year should not be missed.

The title of the paper is “ Let's Verify Step by Step ”.

Not only Ilya, many other authors are also contributors behind OpenAI o1.

Some netizens even called this paper the second most famous paper in the field of AI after "Attention is all you need".

In addition, amid the heated discussion about the team behind OpenAI o1, OpenAI scientist Noam Brown recently posted a clarification that he did not lead Strawberry/OpenAI o1.

But at the same time, it was also revealed that the O1 project was the result of many years of research , and its development really accelerated since October last year.

In this light, it is not surprising that Ilya Sutskever is a "foundational contributor" to OpenAI o1.

Next, let’s take a closer look at the “Let’s Verify Step by Step” paper and the contributors behind OpenAI o1.

Ilya's role at o1

OpenAI o1 focuses on general complex reasoning. Before outputting an answer, it will generate a long chain of thoughts to enhance the model's capabilities.

The paper co-authored by Ilya mainly discussed methods to improve the multi-step reasoning capabilities of large language models.

They mainly compared the effects of outcome supervision and process supervision on training reward models.

Result supervision focuses on the correctness of the final output of the model.

Process supervision focuses on the correctness of each step in the model's reasoning process, and can point out which specific step in the answer is wrong:

The team conducted experiments on the MATH dataset using the GPT-4 base model.

Since there is no easy way to automate this process supervision, we rely on human data annotators to mark the correctness of each step in the model generation solution.

They collected a large amount of human feedback data and created the PRM800K dataset, which contains 800,000 step-level labels.

Experiments are divided into two systems: large-scale and small-scale, each with its own advantages and providing different perspectives.

The research results show that process supervision is significantly better than result supervision and can train a more reliable reward model.

The best model trained using process supervision solves 78.2% of the problems on a representative subset of the MATH test set , significantly outperforming the result supervision model (72.4%) and the majority voting baseline (69.6%) .

We also demonstrate that large reward models can reliably approximate the effect of human supervision on smaller reward models, and that ablation analysis on large-scale data collections can be performed efficiently.

Active learning can also significantly improve the data efficiency of process supervision, by about 2.6 times.

The team also discussed several key advantages of process monitoring.

First, it provides more precise feedback, making credit attribution easier. Second, in terms of AI alignment, process supervision is more likely to produce explainable reasoning.

To evaluate the model’s generalization ability, the team also tested it on AP Physics, AP Calculus, AP Chemistry, and AMC exam questions.

Results show that models trained with process supervision continue to outperform on these new problems, demonstrating their robustness to moderate distribution shifts.

One year after the rapid development of big models, looking back at this paper, some scholars pointed out that there are not many new ideas now:

The key idea is the process reward model, which can evaluate each step or token individually, not just the final result.

But as netizens said, this paper is ultimately a step towards OpenAI o1.

o1 represents the “paradigm shift from memorizing answers to memorizing reasoning.”

Tsinghua and Peking University Alumni o1-mini main person in charge

In addition to Ilya Sutskever, the team behind o1 has also attracted a lot of attention.

The full list given by the official website is divided into two parts: reasoning research and reasoning technology security. A quick look shows that there are far more than 100 people. (So many people, GIF)

Let’s mainly look at this area of research.

Basic contributors: 21 people; Leadership: 7 people;
Core contributors: 46 people;
Contributors: 82 people;
Project managers: 2 persons;
Executive leadership: 8 people;
Supporting leadership: 8 people.

We also saw many familiar figures and Chinese faces among the basic contributors.

Jason Wei , a researcher at OpenAI, previously worked at Google Brain. He is the originator of the thought chain and has also participated in the research on large model emergence capabilities and GPT-4.

Shengjia Zhao graduated from Tsinghua University with a bachelor's degree, then went to Stanford to pursue a doctorate, and came to OpenAI after graduation in 2022. According to his personal introduction, he is keen on training large models. He is one of the core authors of ChatGPT, GPT-4, and GPT-4o mini.

Ren Hongyu graduated from Peking University in 2018 and then came to Stanford to pursue a doctorate in computer science, focusing on large language models. Before joining OpenAI, he worked at technology giants such as Microsoft, Nvidia, Google, and Apple. He is a core contributor to GPT-4o and the leader of GPT-4o mini, mainly teaching models how to think faster, harder, and more sharply.

When the model was first released, he said that the o1-mini was his favorite model.

The above two Tsinghua University and Peking University alumni should be the main persons in charge of o1-mini.

Francis Song , who graduated from Yale and Harvard with a bachelor's degree, worked as an assistant researcher at NYU, majoring in computational neuroscience. After four years at DeepMind, he joined OpenAI in 2022.

Wenda Zhou graduated with a bachelor's degree from the University of Cambridge and received her Ph.D. from Columbia University. Before coming to OpenAI, she worked as a researcher at Simons/NYU and joined OpenAI last year.

Kevin Yu , graduated from UC Berkeley, worked at NASA.

There is also a Chinese face in Leadership.

Mark Chen is currently the Vice President of Research at OpenAI (Frontier). He studied mathematics and computer science at MIT and worked as a quantitative research partner at Integral Technology.

Finally, the full list is also attached.

△ Reasoning research

△ Reasoning technology security

Ultraman: We have taken the initiative in the next few years

By the way, Ultraman gave another public interview a few days ago and talked about the latest model.

He said that although the o1 model can achieve excellent results in competitions such as IOI and IMO, the focus should not be on AI being good at exams, but on its ability to help researchers, such as discovering new materials faster, finding ways to treat diseases, and so on.

This is the beginning of a new paradigm, very early but very important.

Speaking of his vision for the future, he mentioned that there will be two basic commodities in the future, namely intelligence and energy - the ability to be creative, the ability to do intellectual work, and energy, the ability to achieve these goals in the world.

As for the progress of the large model, he said that not only has it not slowed down, but it has already taken the initiative in the next few years.

Reference links:
[1]https://arxiv.org/abs/2305.20050 [2]https://openai.com/openai-o1-contributions/
[3]https://x.com/rohanpaul_ai/status/1835427161370738983?s=46&t=iTysI4vQLQqCNJjSmBODPw
[3]https://x.com/EarningsNugget/status/1834800151598453085

-over-

QuantumBit's annual AI theme planning Now soliciting submissions!

Welcome to submit your contributions to the special topic 1,001 AI applications , 365 AI implementation solutions

Or share with us the AI products you are looking for or the new AI trends you have discovered

Click here ???? Follow me, remember to mark the star~

One-click triple click "Share", "Like" and "Watching"

Advances in science and technology are happening every day ~

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research