71 views|0 replies

157

Posts

3

Resources
The OP
 

Hands-on Deep Learning (PyTorch Edition) - [Reading Activity-Sharing Experience] Implementation of Multilayer Perceptron [Copy link]

 

Introduction

Although the reading activity is over, I really want to finish learning everything in this book. I will continue to update the content of this book later. In this chapter, we learned about multi-layer perceptrons. Compared with single-layer perceptrons, multi-layer perceptrons can handle XOR problems. Each layer has its own weight and bias. The input data of the first layer is input to the hidden layer. Then the hidden layer extracts the features of the input layer (the size of the hidden layer can be specified). Then the features of the hidden layer are used as the input of the next layer. Finally, the classified data is obtained.

I ran the code according to the perceptron in the book. I got the training curves of different models by adjusting the size of the hidden layer.

As we can see in the figure above, I have captured the hidden layer sizes of 10, 64, 512 and 1024. The training curves of different models are compared.

We can see from the above figure that if the number of hidden layers increases (not absolutely, there is basically no difference between 512 and 1024), the training accuracy and test accuracy of the model are relatively smooth. The difference can be compared between 64 and 10 or 10 and 512. We found that except for the loss, the accuracy of the model is actually similar. For the loss function, we can find that the loss of 512 and 1024 is basically the same. But the difference between 10, 64, and 512 is quite large. Through the above, we found a rule. That is: if the size of the hidden layer exceeds a certain threshold, that is, the current layer cannot distinguish more features for the image (for example, a picture can be classified according to any condition with a maximum of 10 features, but the length of the hidden layer here is 20). Then there will be no difference between the hidden layer size of 10 and 20. But if the hidden layer size is less than the maximum value of the feature classification. Then the change of the loss function will be different.

import torch
from torch import nn
from d2l import torch as d2l
batch_size = 256

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)
num_inputs, num_outputs, num_hiddens = 784, 10, 1024

W1 = nn.Parameter(torch.randn(
    num_inputs, num_hiddens, requires_grad=True) * 0.01)
b1 = nn.Parameter(torch.zeros(num_hiddens, requires_grad=True))
W2 = nn.Parameter(torch.randn(
    num_hiddens, num_outputs, requires_grad=True) * 0.01)
b2 = nn.Parameter(torch.zeros(num_outputs, requires_grad=True))

params = [W1, b1, W2, b2]

def relu(X):
    a = torch.zeros_like(X)
    return torch.max(X, a)

def net(X):
    X = X.reshape((-1, num_inputs))
    H = relu(X@W1 + b1)  # 这里“@”代表矩阵乘法
    return (H@W2 + b2)

loss = nn.CrossEntropyLoss(reduction='none')

num_epochs, lr = 10, 0.1
updater = torch.optim.SGD(params, lr=lr)
d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, updater)

d2l.predict_ch3(net, test_iter)

 
 

Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list