Reading experience of "Deep Reinforcement Learning in Action"

小火苗 · Published on 2023-11-28 09:29

Reading experience of "Deep Reinforcement Learning in Action" [Copy link]

Advantages: This book focuses on theoretical explanations, and is quite clear. Even in the code section, it explains the execution flow of the program. Overall, it is a very good book.

Flaws: The design structure of this book contains a lot of inclusion relationships. The later chapters will include the content of the previous chapters. Compared with general domestic books, the thinking logic is difficult to understand. It is not because the content of this book is so profound, but because the way of description is not familiar.

基础篇-导出.pdf (276.21 KB, downloads: 1)

mind Mapping

Deep reinforcement learning in action

What is Reinforcement Learning
1. The computer languages of the future will be more focused on goals and less focused on the procedures specified by the programmer.
2. Deep neural networks have many layers
3. Reinforcement learning is a general framework for representing and solving control tasks
4. Deep Learning
  1. Reinforcement Learning
    - Control Tasks
5. Common tasks such as image classification belong to supervised learning
Markov Decision Process
1. PyTorch Deep Learning Framework
  1. bonus system
  2. Greedy Strategy
  3. Select strategy
  4. Subtopics
2. PyTorch builds the network
  1. Automatic differentiation
    - Build the model
3. The neural network generates the expected reward for each possible action.
4. Value and Policy Functions
  1. Policy Function
    - Optimal strategy
      - Value Function
Deep Q Network
1. Q Function
  1. state
    - Strategy
      - award
2. Q-learning navigation
  1. Gridworld Game Theory
  2. Hyperparameters
    - Hyperparameters for training multiple machine learning algorithms
  3. Discount Factor
    - Controls how much the agent discounts future rewards when making decisions
  4. Build the network
    - Layer 3 Network
      - 164 (input layer), 150 (hidden layer), 4 (output layer)
  5. Gridworld Game Engine
    - Code
  6. Constructing a Neural Network for the Q Function
    - Create a neural network model, define the loss function and learning rate. Build an optimizer, and define some parameters.
    - PyTorch code implementation
      - Subtopics
3. Preventing catastrophic forgetting and experience replay
  1. In essence, very similar state-action pairs (with the same goal) have different results, causing the algorithm to fail to learn
  2. Experience replay is a way to alleviate the main problem of online training algorithms (catastrophic forgetting)
  3. DQN code implementation - DQN loss graph
4. Improve stability with target networks
  1. Using the Q value of the target network to train the Q network will improve the stability of training
  2. Code
    - Compared with the previous training results, its training convergence speed is faster
Policy Gradient Method
1. Policy Function Theory of Neural Networks
2. Policy Gradient Algorithm
  1. Define your goals
    - Neural networks require an objective function that is differentiable with respect to the network weights (parameters)
  2. Strengthening Action
    - After a single action is sampled from the probability distribution of the policy network
  3. Log probability
  4. Credit Allocation
    - The training Gridworld policy network receives a 64-dimensional vector as input and generates a 4-dimensional action probability distribution.
3. OpenAI Gym Collaboration
  1. OpenAI Gym is a suite of open-source environments with a general API that is well suited for testing reinforcement learning algorithms.
  2. The CartPole environment belongs to the classic control part of OpenAI
4. REINFORCE algorithm
  1. Creating a policy network
  2. Agent Interaction with the Environment
  3. Training the model
    - Calculate the probability of the action, calculate the future reward, calculate the loss function, and perform backpropagation
  4. Complete training cycle, code implementation
Critic Algorithm
1. Introduction
  1. This algorithm is used to improve sampling efficiency and reduce variance
2. Reconstructing the value-strategy function
  1. Q-learning learns directly from the information (rewards) available in the environment
3. Distributed training
  1. Python can use multi-process operations to speed up the training algorithm
    - Code
4. Critic Advantage Algorithm
  1. This book describes in detail the code development process and program operation logic

okhxyyo · Published on 2023-11-28 15:59

The OP was very attentive and even made a mind map

Reading experience of "Deep Reinforcement Learning in Action" [Copy link]

Latest reply