The founder of OpenAI hand-wrapped thousands of lines of C code to train GPT, with a PyTorch migration tutorial

Latest update time：2024-04-09

Reads：

Mingmin comes from Ao Fei Temple
Qubit | Official account QbitiAI

The great master Andrej Karpathy has just "returned to work" and immediately brought out his masterpiece:

Pure C language training GPT, 1000 lines of code is done! , no ready-made deep learning framework is needed, just hand-made.

It has only been released for a few hours and has already received 2.3k stars.

It compiles and runs instantly and is fully compatible with PyTorch.

The example used by Kapasi is GPT-2, but Llama 2, Gemma, etc. also apply.

After the project was released, he also gave a tutorial on migrating from PyTorch to C.

Netizens said: He doesn’t even use C++...

And he even gave out hints on how to get the big model to do the same thing . There are already people trying it with Devin.

Manually implement forward/backpropagation for each layer

The reason for choosing GPT-2 is very simple. It has model weights and uses a stacked Transformer model structure.

Core project highlights include:

Train LLM directly on C/CUDA, the speed is close to PyTorch
Accelerate the CPU version by using SIMD instructions such as AVX2 and NEON in the CPU version
Support more advanced architectures, such as Llama2 and Gemma

Capasi explained that he allocates all the required memory at the beginning, and the memory footprint remains the same during training, but the data flows dynamically between batches.

The key is to manually implement forward and backpropagation for each individual layer and concatenate them. For example, below is the forward pass and back pass of layer normalization (layernorm).

Once you have all the layers, string them all together.

Kapasi said that this was very tedious and painful to write, because the process must ensure that all pointers and tensors are arranged correctly.

The left side of the figure below allocates a single one-dimensional memory array, and then points all model weights and activation functions to it.

The picture on the right shows the pointer calculation being performed very carefully.

Once forward/backward propagation is established, the rest is easy.

But at this part, Kapasi felt that he had reached the most interesting part.

I am porting it to CUDA layer by layer to make it more efficient, maybe even as efficient as PyTorch, but without too many dependencies.

There are some extensions from here, like accuracy dropping from fp32 to fp16 or lower, and some more layers (like RoFE) to support more advanced architectures.

Kapasi said that a video will be released later to explain in more detail.

More code is displayed in more detail on the GitHub project page.

Later he also added a tutorial on how to migrate from PyTorch to C.

Netizen: Is this related to LLM OS?

A few days ago, Kapasi, who had "disappeared" for a while, suddenly tweeted, saying that he quit the Internet some time ago and generally felt bad.

He released this new project on the 4th day after connecting to the Internet world. It gave netizens a little shock.

In addition to a series of conventional amazing and great things, people are mainly concerned about three aspects of new projects.

First, why not use Rust?

Kapasi said that he is also learning Rust, but still thinks that C language is very good.

It's simple, clean, light, beautiful, and the best language.

Second, can AI programmers write the same project?

It is worth mentioning that Kapasi also gave a prompt saying that you can try using LLM Agent.

The current model generation results are not that good, but maybe we can look at it again in 1 or 2 years. If successful...

"Maybe AGI is coming?"

Now some netizens have started to try Devin.

He was worried that Devin would find Kapasi's project and copy it directly. So far Devin hasn't done that.

However, Kapasi said that compared to this, he is more worried that LLM Agent will indeed be able to solve this problem in 1-2 years, but by then various codes and discussions will penetrate into the training data in a different way. resulting in unsatisfactory results.

Someone added that data management should be strengthened.

The third more discussed topic is, is this project related to LLM OS?

Kapasi resigned from OpenAI some time ago and plans to start promoting personal projects.

At that time, everyone speculated that he was going to make LLM OS.

In an interview at the end of March, he also discussed this topic again.

He said that the route of AGI is now relatively clear, and everyone is advancing work in full swing. Generally speaking, everyone is working hard to build a "Large Model Operating System (LLM OS)".

I like to compare it to an operating system. You need to prepare various peripherals and connect them to a new CPU. These peripherals include various modalities such as text, images, audio, etc. The CPU is the language model itself. It also connects to all the Software 1.0 infrastructure we've built .

I think everyone is trying to build something like this and then tailor it into a product that works across every sector of the economy.

Now with the launch of new projects, Kapasi's personal project is probably on the agenda.

In the future, a more detailed video explanation version will be released about the LLM.C project. You can look forward to it~

GitHub address:
https://github.com/karpathy/llm.c

Reference links:
[1] https://twitter.com/karpathy/status/1777427944971083809
[2] https://twitter.com/karpathy/status/1777493157485437009
[3] https://twitter.com/karpathy/status/ 1777481372636246491?s=46&t=iTysI4vQLQqCNJjSmBODPw

-over-

[???? Hot registration now] China AIGC Industry Summit

Scheduled for April 17th

The summit has invited guests representing technology, products, investment, users and other fields to discuss the latest transformation trends in the generative AI industry.

The latest confirmed guests include: SenseTime’s Yang Fan , Easy Group’s Gao Yushi , Evernote’s Tang Yi , Ant Group’s Li Jianguo , etc. Learn more about the guests .