This post was last edited by Haoyueguangxifeiziming on 2024-10-11 15:44
OpenAI
In 2015, a research institute was founded by Elon Musk, Sam Altman, Greg Brockman, Ilya Sutskever, Wojciech Zaremba, and John Schulman. It focuses on deep reinforcement learning (DRL).
Institutional Creed
To Ensure That Artificial General Intelligence Benefits All Of Humanity
Deep Reinforcement Learning
Deep reinforcement learning DRL is a combination of reinforcement learning (RL) and deep neural network, and is a subset of machine learning.
Research Results
time
|
Results
|
describe
|
2016
|
OpenAI Gym
|
A toolkit for developing and testing reinforcement learning
|
2018
|
GPT-1
|
Generative Model Architecture
|
2019
|
GPT-2
|
1.2 billion reference parameters
|
2020
|
GPT-3
|
175 billion reference parameters
|
2023
|
GPT-4
|
Pass the Turing Test
|
2024
|
GPT-4o
|
Deploy cross-text, audio and video reasoning models
|
Model significance
- Save training time and training costs
- Easy to use by engineers without data science or machine learning skills
The mathematics behind the model
The structure of RNN (Recurrent Neural Network)
The output of the RNN layer at time step tn is passed as input to the next time step. The hidden state of the RNN is also passed as input to the next time step, allowing the network to save and propagate across different parts of the input sequence.
x is the input at time t
U is the weighted input of the hidden layer h
h is the hidden layer at time t
V is the weighted output of the hidden layer h
y is the output at time t
Main limitations of RNNs
(1) Gradient disappearance and gradient explosion
It is multiplied many times during the gradient back propagation process, causing the gradient to become very small or very large.
(2) Limited context
The input sequence can only be processed one element at a time, so only a limited context can be captured.
(3) Parallelization is difficult
RNN is essentially a sequential execution, which makes it difficult to parallelize the calculations, and therefore cannot make good use of GPU parallel acceleration (Graphical Processing Unit)
|