"Hands-on Deep Learning (PyTorch Edition)" 1. Prerequisites
[Copy link]
This post was last edited by xinmeng_wit on 2024-9-7 18:22
1. Book Introduction
"Hands-On Deep Learning (PyTorch Edition)" is a godfather-level book in the field of deep learning. This is also a major upgraded version of the first edition, which uses the classic PyTorch deep learning framework.
At the same time, this book is also used as a textbook in dozens of universities, which shows that the status of this book in the academic world is almost at the level of a master.
The following is an incomplete screenshot:
This book has 15 chapters and can be divided into 3 parts.
Part 1 contains preparatory and basic knowledge. (Chapters 1 to 4)
The second part mainly discusses modern deep learning techniques. (Chapters 5-10)
Part III discusses scalability, efficiency, and applications. (Chapters 11-15)
The outline directory is as follows:
As can be seen from the table of contents, this book starts directly with perceptrons and neural networks, and does not introduce traditional machine learning methods. In fact, this is the difference between deep learning and machine learning.
Deep learning focuses on complex neural networks, while machine learning focuses on traditional machine learning algorithms.
2. Introduction
The first chapter of this book is an introduction, which mainly briefly describes the relevant concepts of machine learning:
1. Key Components of Machine Learning
- data
- Model
- Objective Function
- Optimization Algorithm
2. Classification of Machine Learning
- Supervised Learning
- Unsupervised Learning
- Interact with the environment
- Reinforcement Learning
3. Development of Deep Learning
4. Successful cases of deep learning
Since this chapter is mainly abstract description, you only need to go through it briefly and will not give too much introduction.
3. Preliminary Knowledge
Deep learning requires some basic skills, and all machine learning methods involve extracting information from data. Therefore, this part mainly explains some common practical skills in data processing, including storage, operation, and data preprocessing.
It is mainly divided into the following parts:
- Data Operations
- Data preprocessing
- Linear Algebra
- Calculus
- Automatic differentiation
- Probability
1. Data Operation
Data manipulation mainly introduces the processing of tensors in PyTorch. Let's take a look at some examples.
Tensor definition:
import torch
x = torch.arange(12)
print(x)
Output:
tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Check out the shape of the tensor:
print(x.shape)
Output:
torch.Size([12])
Change the shape of a tensor:
X = x.reshape(3, 4)
print( X)
Output:
tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
There are many other ways to easily define and transform tensors.
Pytorch also supports many operators, such as commonly used addition, subtraction, multiplication, division, exponentiation, etc.
For example:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y # **运算符是求幂运算
Output:
tensor([ 3., 4., 6., 10.]) tensor([-1., 0., 2., 6.]) tensor([ 2., 4., 8., 16.]) tensor ([0.5000, 1.0000, 2.0000, 4.0000]) tensor([ 1., 4., 16., 64.])
2. Linear Algebra
A scalar is represented by a tensor with only one element. The following code will instantiate two scalars and perform some familiar arithmetic operations, namely addition, multiplication, division, and exponentiation.
x = torch.tensor(3.0)
y = torch.tensor(2.0)
print(x + y, x * y, x / y, x**y)
Output:
tensor(5.) tensor(6.) tensor(1.5000) tensor(9.)
A vector can be viewed as a list of scalar values. These scalar values are called elements or components of the vector . When vectors represent samples in a dataset, their values have some real-world meaning.
Vectors are represented by one-dimensional tensors. In general, tensors can have arbitrary lengths, subject to the memory limitations of the machine.
We can use subscripts to refer to any element of a vector, for example, we can use xi to refer to the i-th element.
import torch
x = torch.arange(4)
print(x[3])
Output:
tensor(3)
The length of the vector:
len(x)
Output:
4
Just as vectors generalize scalars from order zero to order one, matrices generalize vectors from order one to order two. Matrices, which we usually denote with bold, uppercase letters (e.g., X, Y, and Z), are represented in code as tensors with two axes.
A = torch.arange(20).reshape(5, 4)
A
Output:
tensor([[ 0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
As a special type of square matrix, the symmetric matrix A is equal to its transpose: A=A. Here we define a symmetric matrix B:
B = torch.tensor([[1, 2, 3], [2, 0, 4], [3, 4, 5]])
B
Output:
tensor([[1, 2, 3],
[2, 0, 4],
[3, 4, 5]])
Just as vectors are a generalization of scalars and matrices are a generalization of vectors, we can build data structures with more axes. Tensors (in this section, "tensor" refers to algebraic objects) are a general way to describe n-dimensional arrays with an arbitrary number of axes.
Multidimensional Tensors:
X = torch.arange(24).reshape(2, 3, 4)
X
Output:
tensor([[[ 0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
One useful operation we can perform on any tensor is to compute the sum of its elements. Mathematical notation uses the ∑ symbol to represent summation.
x = torch.arange(4, dtype=torch.float32)
x, x.sum()
Output:
(tensor([0., 1., 2., 3.]), tensor(6.))
Dot product:
y = torch.ones(4, dtype = torch.float32)
x, y, torch.dot(x, y)
Output:
(tensor([0., 1., 2., 3.]), tensor([1., 1., 1., 1.]), tensor(6.))
Now that we know how to calculate dot products, we can start to understand matrix -vector products.
A.shape, x.shape, torch.mv(A, x)
Output:
(torch.Size([5, 4]), torch.Size([4]), tensor([ 14., 38., 62., 86., 110.]))
Matrix- matrix multiplication should be easy now that you have the knowledge of dot products and matrix-vector products .
B = torch.ones(4, 3)
torch.mm(A, B)
Output:
tensor([[ 6., 6., 6.],
[22., 22., 22.],
[38., 38., 38.],
[54., 54., 54.],
[70., 70., 70.]])
Some of the most useful operators in linear algebra are norms . Informally speaking, the norm of a vector is an indication of how big a vector is. The notion of size considered here does not involve dimensions, but rather the size of the components.
The square of the L2 norm is more often used in deep learning, and the L1 norm is also often encountered. It is expressed as the sum of the absolute values of the vector elements.
Compared to the L2 norm, the L1 norm is less affected by outliers. To calculate the L1 norm, we combine the absolute value function and the element-wise summation.
torch.abs(u).sum()
Output:
tensor(7.)
3. Calculus
In this section, there are several main points, derivatives, partial derivatives, gradients and chain rules.
Derivative, I believe everyone is familiar with it. The derivative is actually the rate of change.
In deep learning, functions usually depend on many variables. Therefore, we need to extend the idea of differentiation to multivariate functions .
In a multivariate function, the derivative with respect to one of the independent variables is called a partial derivative.
We can concatenate the partial derivatives of a multivariate function with respect to all its variables to obtain the gradient vector of the function .
Chain Rule:
4. Probability
The probability part mainly explains the following parts:
- Axioms of probability theory
- random variable
- Dealing with multiple random variables
- Joint Probability
- Conditional Probability
- Bayes' Theorem
- Marginalization
- Independence
The above are all the knowledge of probability and statistics learned in mathematics. When it is used later, we will come back to study it in detail.
Research nowadays is really too abstract and boring.
The above is all the content of the preparatory knowledge in Chapter 2. I believe that with these mathematical foundations, it will be of great help to the subsequent deep learning.
|