◎ Editor\'s Recommendation • Easy to understand, detailed explanation This book continues the writing style of the previous book, using popular language and a large number of intuitive diagrams to explain in detail, helping readers deepen their understanding of modern deep learning frameworks such as PyTorch, TensorFlow and Chainer, and further consolidate the relevant knowledge of Python programming and software development. • Through \"creating from scratch\", analyzing the mechanism of deep learning framework This book will create a deep learning framework from scratch, allowing readers to understand the technology and mechanisms contained in the deep learning framework while running the program. Through this experience, readers can understand the essence of the deep learning framework. • Incremental development This book divides the complex creation of the deep learning framework into 60 steps. The content is progressive. Readers can get positive feedback results in the step-by-step practice process, which stimulates learning motivation. ◎ Content Introduction Deep learning frameworks contain amazing technologies and interesting mechanisms. This book aims to unveil the mystery of these technologies and mechanisms, help readers correctly understand the technology, and appreciate their fun. To this end, this book will lead readers to create a deep learning framework from scratch - DeZero. DeZero is an original framework of this book, which implements the functions of modern deep learning frameworks with minimal code. This book is divided into 60 steps to complete this framework. In the process, readers will deepen their understanding of modern deep learning frameworks such as PyTorch, TensorFlow and Chainer, and see the essence of deep learning frameworks. This book follows the style of \"Introduction to Deep Learning: Theory and Implementation Based on Python\", with popular language, concise code and detailed explanations. In the process of making your own framework, readers can also further consolidate their knowledge related to Python programming and software development. This book is suitable for readers who are interested in deep learning frameworks. Preface Stage 1 Automatic differentiation 1 Step 1 Variables as “boxes” 3 1.1 What are variables 3 1.2 Implementing the Variable class 4 1.3 (Supplementary) NumPy multidimensional arrays 6 Step 2 Creating functions of variables 8 2.1 What is a function 8 2.2 Implementation of the Function class 9 2.3 Using the Function class 10 Step 3 Continuous function calls 13 3.1 Implementation of the Exp function 13 3.2 Continuous function calls 14 Step 4 Numerical differentiation 16 4.1 What is a derivative 16 4.2 Implementation of numerical differentiation 17 4.3 Derivatives of composite functions 20 4.4 Problems with numerical differentiation 21 Step 5 Theoretical knowledge of backpropagation 22 5.1 The chain rule 22 5.2 Derivation of backpropagation 23 5.3 Representation with a computational graph 25 Step 6 Manual backpropagation 27 6.1 Functional extension of the Variable class 27 6.2 : Function class expansion 28 6.3 Square class and Exp class expansion 28 6.4 Back propagation implementation 29 Step 7 Automated back propagation 32 7.1 Create conditions for automated back propagation 33 7.2 Try back propagation 36 7.3 Add a backward method 38 Step 8 From recursion to loop 40 8.1 The current Variable class 40 8.2 Implementation using loops 41 8.3 Code verification 42 Step 9 Make functions easier to use 43 9.1 Use as a Python function 43 9.2 Simplify the backward method 45 9.3 Only support ndarray 46 Step 10 Testing 50 10.1 Python unit testing 50 10.2 Back propagation test of square function 52 10.3 Automatic testing through gradient checking 53 10.4 Test summary 54 Phase 2 expressed in natural code 59 Step 11 Variable length parameters (forward propagation) 61 11.1 11.2 Implementation of the Add class 64 Step 12 Variable-length parameters (Improvements) 65 12.1 The first improvement: making functions easier to use 65 12.2 The second improvement: making functions easier to implement 67 12.3 Implementation of the add function 69 Step 13 Variable-length parameters (Backpropagation) 70 13.1 Backpropagation of the Add class with variable-length parameters 70 13.2 Modifying the Variable class 71 13.3 Implementation of the Square class 73 Step 14 Reusing the same variable 75 14.1 The cause of the problem 76 14.2 Solution 77 14.3 Resetting the derivative 79 Step 15 Complex computational graphs (Theory) 81 15.1 The correct order of backpropagation 82 15.2 The current DeZero 84 15.3 Function priority 87 Step 16 Complex computational graphs (Implementation) 88 16.1 16.2 Retrieving elements in order of \"generation\" 90 16.3 Variable class backward 92 16.4 Code verification 93 Step 17 Memory management and circular references 97 17.1 Memory management 97 17.2 Reference counting memory management 98 17.3 Circular references 100 17.4 Weakref module 102 17.5 Code verification 104 Step 18 Patterns to reduce memory usage 106 18.1 Do not retain unnecessary derivatives 106 18.2 Review the Function class 109 18.3 Switching with the Confifig class 110 18.4 Switching modes 111 18.5 Switching with the with statement 112 Step 19 Make variables easier to use 116 19.1 Naming variables 116 19.2 Instance variables ndarray 117 19.3 len and print functions 119 Step 20 Operator overloading (1) 122 20.1 Implementation of the Mul class 122 20.2 Operator overloading 125 Step 21 Operator overloading (2) 128 21.1 Use with ndarray 128 21.2 Use with flfloat and int 130 21.3 Problem 1: The case where the left term is a flfloat or int 131 21.4 Problem 2: The case where the left term is an ndarray instance 133 Step 22 Operator overloading (3) 134 22.1 Negative numbers 135 22.2 Subtraction 136 22.3 Division 138 22.4 Exponentiation 139 Step 23 Packaging 141 23.1 File structure 142 23.2 Moving code to core classes 142 23.3 Operator overloading 144 23.4 The actual _ _init_ _.py file 146 23.5 Import dezero 147 Step 24 Derivatives of complex functions 149 24.1 Sphere function 150 24.2 Matyas function 151 24.3 GoldsteinPrice function 152 Phase 3: Implementing higher-order derivatives 161 Step 25 Visualization of computational graphs (1) 163 25.1 Install Graphviz 163 25.2 Describe graphs using DOT language 165 25.3 Specify node attributes 165 25.4 Connect nodes 167 Step 26 Visualization of computational graphs (2) 169 26.1 Examples of using the visualization code 169 26.2 Converting from computational graphs to DOT language 171 26.3 Converting from DOT language to images 174 26.4 Code verification 176 Step 27 Derivatives of Taylor expansion 178 27.1 Implementation of sin function 178 27.2 Theoretical knowledge of Taylor expansion 179 27.3 Implementation of Taylor expansion 180 27.4 Visualization of computational graph 182 Step 28 Function optimization 184 28.1 Rosenbrock function 184 28.2 Derivative 185 28.3 Implementation of gradient descent 186 Step 29 Optimization using Newton\'s method (manual calculation) 190 29.1 Theoretical knowledge of optimization using Newton\'s method 191 29.2 Implementation of optimization using Newton\'s method 195 Step 30 High-order derivatives (preparation) 197 30.1 Confirmation work ①: Variable instance variables 197 30.2 Confirmation of work ②: Function class 199 30.3 Confirmation of work ③: Backpropagation of Variable class 201 Step 31 Higher-order derivatives (theoretical part) 204 31.1 Calculations performed during backpropagation 204 31.2 Methods for creating a computational graph for backpropagation 206 Step 32 Higher-order derivatives (implementation part) 209 32.1 New DeZero 209 32.2 Backpropagation of function class 210 32.3 Implement more efficient backpropagation (add mode control code) 211 32.4 Modify _ _init_ _.py 213 Step 33 Optimization using Newton\'s method (automatic calculation) 215 33.1 Calculate the second-order derivative 215 33.2 Optimization using Newton\'s method 217 Step 34 Higher-order derivatives of the sin function 219 34.1 Implementation of the sin function 219 34.2 Implementation of the cos function 220 34.3 Higher-order derivatives of the sin function 221 Step 35 Computation graphs of higher-order derivatives 225 35.1 Derivatives of the tanh function 226 35.2 Implementation of the tanh function 226 35.3 Visualization of the computation graph of higher-order derivatives 227 Step 36 Other uses of DeZero 234 36.1 Uses of double backprop 234 36.2 Application examples in deep learning research 236 Phase 4 Creating a neural network 243 Step 37 Processing tensors 245 37.1 Calculating elements 245 37.2 Backpropagation using tensors 247 37.3 Backpropagation using tensors (supplemental content) 249 Step 38 Reshaping functions 254 38.1 Implementation of the reshape function 254 38.2 Calling reshape from a Variable object 258 38.3 Transposing a matrix 259 38.4 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of the matrix product 282 Step 42 Linear regression 288 42.1 A toy data set 288 42.2 Theory of linear regression 289 42.3 Implementation of linear regression 291 42.4 DeZero\'s mean_squared_error function (supplemental content) 295 Step 43 Neural Networks 298 43.1 Linear Functions in DeZero 298 43.2 Nonlinear Datasets 301 43.3 Activation Functions and Neural Networks 302 43.4 Neural Network Implementations 303 Step 44 Layers that Aggregate Parameters 307 44.1 Parameter Class Implementation 307 44.2 Layer Class Implementation 309 44.3 Linear Class Implementation 312 44.4 Neural Network Implementations Using Layers 314 Step 45 Layers that Aggregate Layers 316 45.1 Extending the Layer Class 316 45.2 Model Class 319 45.3 Using Model to Solve Problems 321 45.4 MLP Class 323 Step 46 Updating Parameters with an Optimizer 325 46.1 Optimizer Class 325 46.2 SGD Class Implementation 326 46.3 49.4 Code for training 350 49.5 Preprocessing the dataset 351 Step 50 DataLoader for extracting mini-batches 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of the accuracy function 359 50.4 Code for training the spiral dataset 360 Step 51 Training MNIST 363 51.1 MNIST dataset 364 51.2 Training MNIST 366 51.3 Improving the model 368 Phase 5 DeZero advanced challenges 377 Step 52 Supporting GPUs 379 52.1 How to install and use CuPy 379 52.2 cuda module 382 52.3 Add code to Variable/Layer/DataLoader classes 383 52.4 Corresponding modifications to functions 386 52.5 Training MNIST on GPU 388 Step 53 Saving and loading models 391 53.1 NumPy\'s save and load functions 391 53.2 Flattening Layer class parameters 394 53.3 Layer class save and load functions 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of pooling function 426 Step 58 A Representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained Weights Data 431 58.3 Using Trained VGG16 435 Step 59 Using RNN to Process Time Series Data 438 59.1 Implementation of RNN Layer 438 59.2 Implementation of RNN Model 442 59.3 Method of Cutting Connections 445 59.4 Prediction of Sine Wave 446 Step 60 LSTM and Data Loader 451 60.1 Data Loader for Time Series Data 451 60.2 Implementation of LSTM Layer 453 Appendix A inplace Operation (Supplementary Content of Step 14) 463 A.1 Problem Confirmation 463 A.2 About Copying and Overwriting 464 A.3 DeZero Backpropagation 465 Appendix B Implementing get_item Function (Supplementary Content of Step 47) 466 Appendix C In Google Running on Colaboratory 469 Postscript 473 References 4772 Methods for creating a computational graph for backpropagation 206 Step 32 Higher-order derivatives (implementation) 209 32.1 New DeZero 209 32.2 Backpropagation of function classes 210 32.3 Implementing more efficient backpropagation (adding mode control code) 211 32.4 Modifying _ _init_ _.py 213 Step 33 Optimizing using Newton\'s method (automatic calculation) 215 33.1 Finding second-order derivatives 215 33.2 Optimizing using Newton\'s method 217 Step 34 Higher-order derivatives of the sin function 219 34.1 Implementation of the sin function 219 34.2 Implementation of the cos function 220 34.3 Higher-order derivatives of the sin function 221 Step 35 Computational graphs for higher-order derivatives 225 35.1 Derivatives of the tanh function 226 35.2 Implementation of the tanh function 226 35.3 Graph visualization of higher-order derivatives 227 Step 36 Other uses of DeZero 234 36.1 Uses of double backprop 234 36.2 Example of application in deep learning research 236 Phase 4: Creating a neural network 243 Step 37 Processing tensors 245 37.1 Element-by-element computation 245 37.2 Backpropagation with tensors 247 37.3 Backpropagation with tensors (supplemental content) 249 Step 38 Reshaping functions 254 38.1 Implementation of the reshape function 254 38.2 Calling reshape from a Variable object 258 38.3 Transposing a matrix 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Sum functions 264 39.1 Backpropagation of the sum function 264 39.2 Implementation of the sum function 266 39.3 axis and keepdims 268 Step 40 Functions that perform broadcasting 272 40.1 broadcast_to and sum_to 272 40.2 DeZero’s broadcast_to and sum_to 275 40.3 Support for broadcasting 277 Step 41 Matrix products 280 41.1 Inner products of vectors and matrix products 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of matrix products 282 Step 42 Linear regression 288 42.1 A toy dataset 288 42.2 Theory of linear regression 289 42.3 Implementation of linear regression 291 42.4 DeZero’s mean_squared_error (supplemental) 295 Step 43 Neural networks 298 43.1 Linear functions in DeZero 298 43.2 Nonlinear datasets 301 43.3 Activation functions and neural networks 302 43.4 Implementation of Neural Networks 303 Step 44 Layers that Aggregate Parameters 307 44.1 Implementation of the Parameter Class 307 44.2 Implementation of the Layer Class 309 44.3 Implementation of the Linear Class 312 44.4 Implementing Neural Networks Using Layers 314 Step 45 Layers that Aggregate Layers 316 45.1 Extending the Layer Class 316 45.2 The Model Class 319 45.3 Using the Model to Solve a Problem 321 45.4 The MLP Class 323 Step 46 Updating Parameters with an Optimizer 325 46.1 The Optimizer Class 325 46.2 Implementation of the SGD Class 326 46.3 Using the SGD Class to Solve a Problem 327 46.4 Optimization Methods Other than SGD 328 Step 47 The softmax Function and Cross Entropy Error 331 47.1 Functions for Slicing 331 47.2 The softmax Function 334 47.3 The Cross Entropy Error 337 350 51.1 MNIST dataset 351 51.2 Training MNIST 352 51.3 Improving the model 353 Phase 5 DeZero Advanced Challenge 365 Step 52 Supporting GPUs 366 52.1 What is an iterator? 366 52.2 Using DataLoader 367 52.3 Implementation of the accuracy function 368 52.4 Training code for the spiral dataset 369 Step 53 Training MNIST 370 57.1 MNIST dataset 371 57.2 Training MNIST 373 57.3 Improving the model 374 Phase 5 DeZero Advanced Challenge 382 Step 54 Supporting GPUs 383 54.1 379 52.2 CuPy installation and usage 382 52.3 Add code to Variable / Layer / DataLoader class 383 52.4 Corresponding modifications to the function 386 52.5 Training MNIST on the GPU 388 Step 53 Saving and loading the model 391 53.1 NumPy\'s save function and load function 391 53.2 Flattening Layer class parameters 394 53.3 Layer class save function and load function 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Calculation of output size 409 Step 56 The mechanism of CNN (2) 411 56.1 Third-order tensor 411 56.2 Thinking in terms of blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of pooling function 426 Step 58 Representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weights data 431 58.3 Using trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of RNN layer 438 59.2 Implementation of RNN model 442 59.3 Method of disconnecting 445 59.4 Prediction of sine wave 446 Step 60 LSTM and data loader 451 60.1 Data loader for time series data 451 60.2 Implementation of LSTM layer 453 Appendix A inplace operation (supplementary to step 14) 463 A.1 Problem confirmation 463 A.2 About copying and overwriting 464 A.3 DeZero backpropagation 465 Appendix B Implementation of get_item function (supplementary to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4772 Methods for creating a computational graph for backpropagation 206 Step 32 Higher-order derivatives (implementation) 209 32.1 New DeZero 209 32.2 Backpropagation of function classes 210 32.3 Implementing more efficient backpropagation (adding mode control code) 211 32.4 Modifying _ _init_ _.py 213 Step 33 Optimizing using Newton\'s method (automatic calculation) 215 33.1 Finding second-order derivatives 215 33.2 Optimizing using Newton\'s method 217 Step 34 Higher-order derivatives of the sin function 219 34.1 Implementation of the sin function 219 34.2 Implementation of the cos function 220 34.3 Higher-order derivatives of the sin function 221 Step 35 Computational graphs for higher-order derivatives 225 35.1 Derivatives of the tanh function 226 35.2 Implementation of the tanh function 226 35.3 Graph visualization of higher-order derivatives 227 Step 36 Other uses of DeZero 234 36.1 Uses of double backprop 234 36.2 Example of application in deep learning research 236 Phase 4: Creating a neural network 243 Step 37 Processing tensors 245 37.1 Element-by-element computation 245 37.2 Backpropagation with tensors 247 37.3 Backpropagation with tensors (supplemental content) 249 Step 38 Reshaping functions 254 38.1 Implementation of the reshape function 254 38.2 Calling reshape from a Variable object 258 38.3 Transposing a matrix 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Sum functions 264 39.1 Backpropagation of the sum function 264 39.2 Implementation of the sum function 266 39.3 axis and keepdims 268 Step 40 Functions that perform broadcasting 272 40.1 broadcast_to and sum_to 272 40.2 DeZero’s broadcast_to and sum_to 275 40.3 Support for broadcasting 277 Step 41 Matrix products 280 41.1 Inner products of vectors and matrix products 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of matrix products 282 Step 42 Linear regression 288 42.1 A toy dataset 288 42.2 Theory of linear regression 289 42.3 Implementation of linear regression 291 42.4 DeZero’s mean_squared_error (supplemental) 295 Step 43 Neural networks 298 43.1 Linear functions in DeZero 298 43.2 Nonlinear datasets 301 43.3 Activation functions and neural networks 302 43.4 Implementation of Neural Networks 303 Step 44 Layers that Aggregate Parameters 307 44.1 Implementation of the Parameter Class 307 44.2 Implementation of the Layer Class 309 44.3 Implementation of the Linear Class 312 44.4 Implementing Neural Networks Using Layers 314 Step 45 Layers that Aggregate Layers 316 45.1 Extending the Layer Class 316 45.2 The Model Class 319 45.3 Using the Model to Solve a Problem 321 45.4 The MLP Class 323 Step 46 Updating Parameters with an Optimizer 325 46.1 The Optimizer Class 325 46.2 Implementation of the SGD Class 326 46.3 Using the SGD Class to Solve a Problem 327 46.4 Optimization Methods Other than SGD 328 Step 47 The softmax Function and Cross Entropy Error 331 47.1 Functions for Slicing 331 47.2 The softmax Function 334 47.3 The Cross Entropy Error 337 350 51.1 MNIST dataset 351 51.2 Training MNIST 352 51.3 Improving the model 353 Phase 5 DeZero Advanced Challenge 365 Step 52 Supporting GPUs 366 52.1 What is an iterator? 366 52.2 Using DataLoader 367 52.3 Implementation of the accuracy function 368 52.4 Training code for the spiral dataset 369 Step 53 Training MNIST 370 57.1 MNIST dataset 371 57.2 Training MNIST 373 57.3 Improving the model 374 Phase 5 DeZero Advanced Challenge 382 Step 54 Supporting GPUs 383 54.1 379 52.2 CuPy installation and usage 382 52.3 Add code to Variable / Layer / DataLoader class 383 52.4 Corresponding modifications to the function 386 52.5 Training MNIST on the GPU 388 Step 53 Saving and loading the model 391 53.1 NumPy\'s save function and load function 391 53.2 Flattening Layer class parameters 394 53.3 Layer class save function and load function 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Calculation of output size 409 Step 56 The mechanism of CNN (2) 411 56.1 Third-order tensor 411 56.2 Thinking in terms of blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of pooling function 426 Step 58 Representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weights data 431 58.3 Using trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of RNN layer 438 59.2 Implementation of RNN model 442 59.3 Method of disconnecting 445 59.4 Prediction of sine wave 446 Step 60 LSTM and data loader 451 60.1 Data loader for time series data 451 60.2 Implementation of LSTM layer 453 Appendix A inplace operation (supplementary to step 14) 463 A.1 Problem confirmation 463 A.2 About copying and overwriting 464 A.3 DeZero backpropagation 465 Appendix B Implementation of get_item function (supplementary to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4772 Optimization using Newton\'s method 217 Step 34 Higher-order derivatives of the sin function 219 34.1 Implementation of the sin function 219 34.2 Implementation of the cos function 220 34.3 Higher-order derivatives of the sin function 221 Step 35 Computation graphs of higher-order derivatives 225 35.1 Derivatives of the tanh function 226 35.2 Implementation of the tanh function 226 35.3 Computation graph visualization of higher-order derivatives 227 Step 36 Other uses of DeZero 234 36.1 Uses of double backprop 234 36.2 Example of application in deep learning research 236 Stage 4 Creating a neural network 243 Step 37 Processing tensors 245 37.1 Calculating elements 245 37.2 Backpropagation when using tensors 247 37.3 Backpropagation when using tensors (supplemental content) 249 Step 38 Functions that change shape 254 38.1 Implementation of the reshape function 254 38.2 Calling reshape from a Variable object 258 38.3 Transposing a matrix 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Functions that sum 264 39.1 Backpropagation of the sum function 264 39.2 Implementation of the sum function 266 39.3 axis and keepdims 268 Step 40 Functions that broadcast 272 40.1 broadcast_to and sum_to functions 272 40.2 DeZero’s broadcast_to and sum_to functions 275 40.3 Support for broadcasting 277 Step 41 Matrix product 280 41.1 Inner product of vectors and matrix product 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of the matrix product 282 Step 42 Linear regression 288 42.1 44.3 Implementation of Linear Class 312 44.4 Using Layer to Implement Neural Networks 314 Step 45 Layers of Aggregating Layers 316 45.1 Extending the Layer Class 316 45.2 The Model Class 319 45.3 Using Model to Solve Problems 321 45.4 The MLP Class 323 Step 46 325 46.1 The Optimizer class 325 46.2 Implementation of the SGD class 326 46.3 Using the SGD class to solve a problem 327 46.4 Optimization methods other than SGD 328 Step 47 The softmax function and the cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 The cross entropy error 337 Step 48 Multiclassification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing the dataset 351 Step 50 DataLoader for retrieving mini-batches 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of accuracy function 359 50.4 Training code for spiral dataset 360 Step 51 Training MNIST 363 51.1 MNIST dataset 364 51.2 Training MNIST 366 51.3 Improving the model 368 Phase 5 DeZero Advanced Challenge 377 Step 52 Supporting GPU 379 52.1 How to install and use CuPy 379 52.2 cuda module 382 52.3 Add code to Variable/Layer/DataLoader class 383 52.4 Corresponding changes to the function 386 52.5 Training MNIST on GPU 388 Step 53 Saving and loading the model 391 53.1 NumPy\'s save function and load function 391 53.2 Flattening Layer class parameters 394 53.3 The save and load functions of the Layer class 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expand using im2col 418 57.2 Implementation of the conv2d function 420 57.3 Implementation of the Conv2d layer 425 57.4 Implementation of the pooling function 426 Step 58 A representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weight data 431 58.3 Using the trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of the RNN layer 438 59.2 Implementation of the RNN model 442 59.3 Methods for disconnecting connections 445 59.4 Prediction of sine waves 446 Step 60 LSTM and data loaders 451 60.1 Data loader for time series data 451 60.2 Implementation of the LSTM layer 453 Appendix A inplace operation (supplement to step 14) 463 A.1 Problem identification 463 A.2 About copying and overwriting 464 A.3 Backpropagation of DeZero 465 Appendix B Implementing the get_item function (supplementary to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4772 Optimization using Newton\'s method 217 Step 34 Higher-order derivatives of the sin function 219 34.1 Implementation of the sin function 219 34.2 Implementation of the cos function 220 34.3 Higher-order derivatives of the sin function 221 Step 35 Computation graphs of higher-order derivatives 225 35.1 Derivatives of the tanh function 226 35.2 Implementation of the tanh function 226 35.3 Computation graph visualization of higher-order derivatives 227 Step 36 Other uses of DeZero 234 36.1 Uses of double backprop 234 36.2 Example of application in deep learning research 236 Stage 4 Creating a neural network 243 Step 37 Processing tensors 245 37.1 Calculating elements 245 37.2 Backpropagation when using tensors 247 37.3 Backpropagation when using tensors (supplemental content) 249 Step 38 Functions that change shape 254 38.1 Implementation of the reshape function 254 38.2 Calling reshape from a Variable object 258 38.3 Transposing a matrix 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Functions that sum 264 39.1 Backpropagation of the sum function 264 39.2 Implementation of the sum function 266 39.3 axis and keepdims 268 Step 40 Functions that broadcast 272 40.1 broadcast_to and sum_to functions 272 40.2 DeZero’s broadcast_to and sum_to functions 275 40.3 Support for broadcasting 277 Step 41 Matrix product 280 41.1 Inner product of vectors and matrix product 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of the matrix product 282 Step 42 Linear regression 288 42.1 44.3 Implementation of Linear Class 312 44.4 Using Layer to Implement Neural Networks 314 Step 45 Layers of Aggregating Layers 316 45.1 Extending the Layer Class 316 45.2 The Model Class 319 45.3 Using Model to Solve Problems 321 45.4 The MLP Class 323 Step 46 325 46.1 The Optimizer class 325 46.2 Implementation of the SGD class 326 46.3 Using the SGD class to solve a problem 327 46.4 Optimization methods other than SGD 328 Step 47 The softmax function and the cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 The cross entropy error 337 Step 48 Multiclassification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing the dataset 351 Step 50 DataLoader for extracting mini-batches 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of accuracy function 359 50.4 Training code for spiral dataset 360 Step 51 Training MNIST 363 51.1 MNIST dataset 364 51.2 Training MNIST 366 51.3 Improving the model 368 Phase 5 DeZero Advanced Challenge 377 Step 52 Supporting GPU 379 52.1 How to install and use CuPy 379 52.2 cuda module 382 52.3 Add code to Variable/Layer/DataLoader class 383 52.4 Corresponding changes to the function 386 52.5 Training MNIST on GPU 388 Step 53 Saving and loading the model 391 53.1 NumPy\'s save function and load function 391 53.2 Flattening Layer class parameters 394 53.3 The save and load functions of the Layer class 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expand using im2col 418 57.2 Implementation of the conv2d function 420 57.3 Implementation of the Conv2d layer 425 57.4 Implementation of the pooling function 426 Step 58 A representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weight data 431 58.3 Using the trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of the RNN layer 438 59.2 Implementation of the RNN model 442 59.3 Methods for disconnecting connections 445 59.4 Prediction of sine waves 446 Step 60 LSTM and data loaders 451 60.1 Data loader for time series data 451 60.2 Implementation of the LSTM layer 453 Appendix A inplace operation (supplement to step 14) 463 A.1 Problem identification 463 A.2 About copying and overwriting 464 A.3 Backpropagation of DeZero 465 Appendix B Implementing the get_item function (supplementary to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4773 Transpose of matrices 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Sum function 264 39.1 Backpropagation of sum function 264 39.2 Implementation of sum function 266 39.3 axis and keepdims 268 Step 40 Broadcasting function 272 40.1 broadcast_to and sum_to functions 272 40.2 DeZero\'s broadcast_to and sum_to functions 275 40.3 Support for broadcasting 277 Step 41 Matrix product 280 41.1 Inner product of vectors and matrix product 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of matrix product 282 Step 42 Linear regression 288 42.1 Toy data set 288 42.2 Theory of linear regression 289 42.3 Implementation of Linear Regression 291 42.4 DeZero\'s mean_squared_error Function (Supplementary Content) 295 Step 43 Neural Networks 298 43.1 Linear Function in DeZero 298 43.2 Nonlinear Datasets 301 43.3 Activation Functions and Neural Networks 302 43.4 Neural Network Implementation 303 Step 44 Layers Summarizing Parameters 307 44.1 Parameter Class Implementation 307 44.2 Layer Class Implementation 309 44.3 Linear Class Implementation 312 44.4 Using Layer to Implement Neural Networks 314 Step 45 Layers Summarizing Layers 316 45.1 Extending the Layer Class 316 45.2 Model Class 319 45.3 Using Model to Solve Problems 321 45.4 MLP Class 323 Step 46 Updating Parameters via Optimizer 325 46.1 Optimizer Class 325 46.2 Implementation of the SGD class 326 46.3 Using the SGD class to solve a problem 327 46.4 Optimization methods other than SGD 328 Step 47 The softmax function and the cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 The cross entropy error 337 Step 48 Multi-classification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing of the dataset 351 Step 50 DataLoader for extracting mini-batches of data 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of the accuracy function 359 50.4 Training code for the spiral dataset 360 Step 51 Training MNIST 363 51.1 MNIST dataset 364 51.2 Training MNIST 366 51.3 Improving the model 368 Phase 5 DeZero Advanced Challenges 377 Step 52 Supporting GPUs 379 52.1 How to install and use CuPy 379 52.2 cuda module 382 52.3 Add code to Variable/Layer/DataLoader classes 383 52.4 Corresponding modifications to functions 386 52.5 Training MNIST on GPU 388 Step 53 Saving and loading models 391 53.1 NumPy\'s save and load functions 391 53.2 Flattening Layer class parameters 394 53.3 Layer class save and load functions 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of the pooling function 426 Step 58 A representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weight data 431 58.3 Using the trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of RNN layer 438 59.2 Implementation of RNN model 442 59.3 Method of disconnecting 445 59.4 Prediction of sine wave 446 Step 60 LSTM and data loader 451 60.1 Data loader for time series data 451 60.2 Implementation of LSTM layer 453 Appendix A inplace operation (supplement to step 14) 463 A.1 Problem confirmation 463 A.2 About copying and overwriting 464 A.3 DeZero backpropagation 465 Appendix B Implementing the get_item function (supplemental to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4773 Transpose of matrices 259 38.4 The actual transpose function (supplemental content) 262 Step 39 Sum function 264 39.1 Backpropagation of sum function 264 39.2 Implementation of sum function 266 39.3 axis and keepdims 268 Step 40 Broadcasting function 272 40.1 broadcast_to and sum_to functions 272 40.2 DeZero\'s broadcast_to and sum_to functions 275 40.3 Support for broadcasting 277 Step 41 Matrix product 280 41.1 Inner product of vectors and matrix product 280 41.2 Checking the shape of a matrix 282 41.3 Backpropagation of matrix product 282 Step 42 Linear regression 288 42.1 Toy data set 288 42.2 Theory of linear regression 289 42.3 Implementation of Linear Regression 291 42.4 DeZero\'s mean_squared_error Function (Supplementary Content) 295 Step 43 Neural Networks 298 43.1 Linear Function in DeZero 298 43.2 Nonlinear Datasets 301 43.3 Activation Functions and Neural Networks 302 43.4 Neural Network Implementation 303 Step 44 Layers Summarizing Parameters 307 44.1 Parameter Class Implementation 307 44.2 Layer Class Implementation 309 44.3 Linear Class Implementation 312 44.4 Using Layer to Implement Neural Networks 314 Step 45 Layers Summarizing Layers 316 45.1 Extending the Layer Class 316 45.2 Model Class 319 45.3 Using Model to Solve Problems 321 45.4 MLP Class 323 Step 46 Updating Parameters via Optimizer 325 46.1 Optimizer Class 325 46.2 Implementation of the SGD class 326 46.3 Using the SGD class to solve a problem 327 46.4 Optimization methods other than SGD 328 Step 47 The softmax function and the cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 The cross entropy error 337 Step 48 Multi-classification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing of the dataset 351 Step 50 DataLoader for extracting mini-batches of data 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of the accuracy function 359 50.4 Training code for the spiral dataset 360 Step 51 Training MNIST 363 51.1 MNIST dataset 364 51.2 Training MNIST 366 51.3 Improving the model 368 Phase 5 DeZero Advanced Challenges 377 Step 52 Supporting GPUs 379 52.1 How to install and use CuPy 379 52.2 cuda module 382 52.3 Add code to Variable/Layer/DataLoader classes 383 52.4 Corresponding modifications to functions 386 52.5 Training MNIST on GPU 388 Step 53 Saving and loading models 391 53.1 NumPy\'s save and load functions 391 53.2 Flattening Layer class parameters 394 53.3 Layer class save and load functions 395 Step 54 Dropout and test mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of the pooling function 426 Step 58 A representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained weight data 431 58.3 Using the trained VGG16 435 Step 59 Using RNN to process time series data 438 59.1 Implementation of RNN layer 438 59.2 Implementation of RNN model 442 59.3 Method of disconnecting 445 59.4 Prediction of sine wave 446 Step 60 LSTM and data loader 451 60.1 Data loader for time series data 451 60.2 Implementation of LSTM layer 453 Appendix A inplace operation (supplement to step 14) 463 A.1 Problem confirmation 463 A.2 About copying and overwriting 464 A.3 DeZero backpropagation 465 Appendix B Implementing the get_item function (supplemental to step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4774 Optimization methods other than SGD 328 Step 47 The softmax function and cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 Cross entropy error 337 Step 48 Multi-classification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing of the dataset 351 Step 50 DataLoader for extracting mini-batches 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of the accuracy function 359 50.4 Code for training the spiral dataset 360 Step 51 Training MINST 363 51.1 MNIST Dataset 364 51.2 Training MNIST 366 51.3 Improving Models 368 Phase 5 DeZero Advanced Challenges 377 Step 52 GPU Support 379 52.1 CuPy Installation and Usage 379 52.2 cuda Module 382 52.3 Add Code to Variable/Layer/DataLoader Classes 383 52.4 Corresponding Modifications to Functions 386 52.5 Training MNIST on GPU 388 Step 53 Saving and Loading Models 391 53.1 NumPy\'s save and load functions 391 53.2 Flattening Layer Class Parameters 394 53.3 Layer Class save and load functions 395 Step 54 Dropout and Test Mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Output size calculation method 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of pooling function 426 Step 58 Representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained Weights Data 431 58.3 Using Trained VGG16 435 Step 59 Using RNN to Process Time Series Data 438 59.1 Implementation of RNN Layer 438 59.2 Implementation of RNN Model 442 59.3 Method of Cutting Connections 445 59.4 Prediction of Sine Wave 446 Step 60 LSTM and Data Loader 451 60.1 Data Loader for Time Series Data 451 60.2 Implementation of LSTM Layer 453 Appendix A inplace Operation (Supplementary Content of Step 14) 463 A.1 Problem Confirmation 463 A.2 About Copying and Overwriting 464 A.3 DeZero Backpropagation 465 Appendix B Implementing get_item Function (Supplementary Content of Step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 4774 Optimization methods other than SGD 328 Step 47 The softmax function and cross entropy error 331 47.1 Functions for slicing 331 47.2 The softmax function 334 47.3 Cross entropy error 337 Step 48 Multi-classification 340 48.1 The spiral dataset 340 48.2 Code for training 341 Step 49 The Dataset class and preprocessing 346 49.1 Implementation of the Dataset class 346 49.2 The case of large datasets 348 49.3 Data concatenation 349 49.4 Code for training 350 49.5 Preprocessing of the dataset 351 Step 50 DataLoader for extracting mini-batches 354 50.1 What is an iterator 354 50.2 Using DataLoader 358 50.3 Implementation of the accuracy function 359 50.4 Code for training the spiral dataset 360 Step 51 Training MINST 363 51.1 MNIST Dataset 364 51.2 Training MNIST 366 51.3 Improving the Model 368 Phase 5 DeZero Advanced Challenges 377 Step 52 GPU Support 379 52.1 CuPy Installation and Usage 379 52.2 cuda Module 382 52.3 Add Code to Variable/Layer/DataLoader Classes 383 52.4 Corresponding Modifications to Functions 386 52.5 Training MNIST on GPU 388 Step 53 Saving and Loading Models 391 53.1 NumPy\'s save and load functions 391 53.2 Flattening Layer Class Parameters 394 53.3 Layer Class save and load functions 395 Step 54 Dropout and Test Mode 398 54.1 What is Dropout 398 54.2 Inverted Dropout 401 54.3 Adding a test mode 401 54.4 Implementation of Dropout 402 Step 55 CNN mechanism (1) 404 55.1 CNN network structure 404 55.2 Convolution operation 405 55.3 Padding 407 55.4 Stride 408 55.5 Calculation method of output size 409 Step 56 CNN mechanism (2) 411 56.1 Third-order tensor 411 56.2 Thinking in combination with blocks 412 56.3 Mini-batch processing 414 56.4 Pooling layer 415 Step 57 conv2d function and pooling function 418 57.1 Expanding with im2col 418 57.2 Implementation of conv2d function 420 57.3 Implementation of Conv2d layer 425 57.4 Implementation of pooling function 426 Step 58 Representative CNN (VGG16) 429 58.1 Implementation of VGG16 429 58.2 Trained Weights Data 431 58.3 Using Trained VGG16 435 Step 59 Using RNN to Process Time Series Data 438 59.1 Implementation of RNN Layer 438 59.2 Implementation of RNN Model 442 59.3 Method of Cutting Connections 445 59.4 Prediction of Sine Wave 446 Step 60 LSTM and Data Loader 451 60.1 Data Loader for Time Series Data 451 60.2 Implementation of LSTM Layer 453 Appendix A inplace Operation (Supplementary Content of Step 14) 463 A.1 Problem Confirmation 463 A.2 About Copying and Overwriting 464 A.3 DeZero Backpropagation 465 Appendix B Implementing get_item Function (Supplementary Content of Step 47) 466 Appendix C Running on Google Colaboratory 469 Postscript 473 References 477
You Might Like
Recommended ContentMore
Open source project More
Popular Components
Searched by Users
Just Take a LookMore
Trending Downloads
Trending ArticlesMore