819 views|1 replies

156

Posts

0

Resources
The OP
 

Reading experience of "Deep Reinforcement Learning in Action" [Copy link]

Advantages: This book focuses on theoretical explanations, and is quite clear. Even in the code section, it explains the execution flow of the program. Overall, it is a very good book.

Flaws: The design structure of this book contains a lot of inclusion relationships. The later chapters will include the content of the previous chapters. Compared with general domestic books, the thinking logic is difficult to understand. It is not because the content of this book is so profound, but because the way of description is not familiar.

基础篇-导出.pdf (276.21 KB, downloads: 1)

mind Mapping

Deep reinforcement learning in action

  1. What is Reinforcement Learning
    1. The computer languages of the future will be more focused on goals and less focused on the procedures specified by the programmer.
    2. Deep neural networks have many layers
    3. Reinforcement learning is a general framework for representing and solving control tasks
    4. Deep Learning
      1. Reinforcement Learning
        • Control Tasks
    5. Common tasks such as image classification belong to supervised learning
  2. Markov Decision Process
    1. PyTorch Deep Learning Framework
      1. bonus system
      2. Greedy Strategy
      3. Select strategy
      4. Subtopics
    2. PyTorch builds the network
      1. Automatic differentiation
        • Build the model
    3. The neural network generates the expected reward for each possible action.
    4. Value and Policy Functions
      1. Policy Function
        • Optimal strategy
          • Value Function
  3. Deep Q Network
    1. Q Function
      1. state
        • Strategy
          • award
    2. Q-learning navigation
      1. Gridworld Game Theory
      2. Hyperparameters
        • Hyperparameters for training multiple machine learning algorithms
      3. Discount Factor
        • Controls how much the agent discounts future rewards when making decisions
      4. Build the network
        • Layer 3 Network
          • 164 (input layer), 150 (hidden layer), 4 (output layer)
      5. Gridworld Game Engine
        • Code
      6. Constructing a Neural Network for the Q Function
        • Create a neural network model, define the loss function and learning rate. Build an optimizer, and define some parameters.
        • PyTorch code implementation
          • Subtopics
    3. Preventing catastrophic forgetting and experience replay
      1. In essence, very similar state-action pairs (with the same goal) have different results, causing the algorithm to fail to learn
      2. Experience replay is a way to alleviate the main problem of online training algorithms (catastrophic forgetting)
      3. DQN code implementation - DQN loss graph
    4. Improve stability with target networks
      1. Using the Q value of the target network to train the Q network will improve the stability of training
      2. Code
        • Compared with the previous training results, its training convergence speed is faster
  4. Policy Gradient Method
    1. Policy Function Theory of Neural Networks
    2. Policy Gradient Algorithm
      1. Define your goals
        • Neural networks require an objective function that is differentiable with respect to the network weights (parameters)
      2. Strengthening Action
        • After a single action is sampled from the probability distribution of the policy network
      3. Log probability
      4. Credit Allocation
        • The training Gridworld policy network receives a 64-dimensional vector as input and generates a 4-dimensional action probability distribution.
    3. OpenAI Gym Collaboration
      1. OpenAI Gym is a suite of open-source environments with a general API that is well suited for testing reinforcement learning algorithms.
      2. The CartPole environment belongs to the classic control part of OpenAI
    4. REINFORCE algorithm
      1. Creating a policy network
      2. Agent Interaction with the Environment
      3. Training the model
        • Calculate the probability of the action, calculate the future reward, calculate the loss function, and perform backpropagation
      4. Complete training cycle, code implementation
  5. Critic Algorithm
    1. Introduction
      1. This algorithm is used to improve sampling efficiency and reduce variance
    2. Reconstructing the value-strategy function
      1. Q-learning learns directly from the information (rewards) available in the environment
    3. Distributed training
      1. Python can use multi-process operations to speed up the training algorithm
        • Code
    4. Critic Advantage Algorithm
      1. This book describes in detail the code development process and program operation logic
This post is from Embedded System

Latest reply

The OP was very attentive and even made a mind map   Details Published on 2023-11-28 15:59

1w

Posts

204

Resources
2
 

The OP was very attentive and even made a mind map

This post is from Embedded System
Add and join groups EEWorld service account EEWorld subscription account Automotive development circle
 
Personal signature

玩板看这里:

http://en.eeworld.com/bbs/elecplay.html

EEWorld测评频道众多好板等你来玩,还可以来频道许愿树许愿说说你想要玩的板子,我们都在努力为大家实现!

 

Guess Your Favourite
Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list