Three-day quick-learning TensorFlow courseware sharing
This is a minimalist TensorFlow introductory tutorial published by the Hong Kong University of Science and Technology. The three-day full slide tutorial has been shared to Google Drive. Synced will briefly introduce the tutorial and use it to sort out the introductory concepts and implementation of TensorFlow.
The first day of the course introduces the potential and basic concepts of deep learning and machine learning, and then begins to explore the deep learning framework TensorFlow. First, we will learn how to install TensorFlow. In fact, we feel that the TensorFlow environment configuration is quite convenient, and basically you can complete the installation by following the tutorial on the official website. Then, starting from "Hello TensorFlow", we will explain the basic concepts such as computational graphs, placeholders, tensors, etc.
Of course, to truly understand TensorFlow, we need to learn the most basic concepts little by little from actual practice, so the first day focused on linear regression, logistic regression, softmax classification and neural networks. Each model starts with the most basic concepts to derive the operation process, and then combines TensorFlow to explain the true meaning of tensors, computational graphs, etc. The neural network part is explained in great detail. We will start with the most basic perceptron principle and then use multi-layer perceptrons to solve the XOR problem. The key point is that this course derives the mathematical process of forward propagation and back propagation in detail and implements it with TensorFlow.
The second day of the tutorial discusses convolutional neural networks in detail, starting with the training and construction techniques of TensorFlow, explaining various weight initialization methods, activation functions, loss functions, regularization, and various optimization methods applied to neural networks. In the subsequent part of the tutorial discussing the principles of CNN, we can see that most of the explanations are based on the Stanford CS231n course. The last part of the second day is to implement the previous theory using TensorFlow. The tutorial uses separate code blocks to explain the concepts of various parts of CNN, such as 2D convolutional layers and max pooling layers.
The third day of the tutorial explains the recurrent neural network in detail. It starts with time series data and explains the basic concepts and principles of RNN, including very advanced and efficient mechanisms such as encoder-decoder mode, attention mechanism and gated recurrent unit. The latter part of the tutorial uses a lot of implementation code to explain the basic concepts of recurrent neural networks we have learned before, including the construction of a single recurrent unit in TensorFlow, the construction of batch input and recurrent layers, the construction of RNN sequence loss function, and training calculation graphs.
Below, Synced will briefly introduce the basic concepts of TensorFlow and the introductory implementation of TensorFlow machine learning based on the tutorial materials. For more details, please refer to the Hong Kong University of Science and Technology's three-day TensorFlow crash course materials
-
Three-day crash course Google Drive data address: https://drive.google.com/drive/folders/0B41Zbb4c8HVyY1F5Ml94Z2hodkE
-
Three-day crash course Baidu cloud disk data address: http://pan.baidu.com/s/1boGGzeR
TensorFlow Basics
This section will briefly introduce TensorFlow from the basic concepts of tensors and graphs, constants and variables, and placeholders. Readers who need to learn more about TensorFlow can read Google's TensorFlow documentation, or other Chinese tutorials or books, such as "TensorFlow: Google's Deep Learning Framework in Action" and "TensorFlow in Action".
-
TensorFlow documentation address: https://www.tensorflow.org/get_started/
1. Figure
TensorFlow is an open source software library for numerical computation using data flow graphs. Tensor represents the data being transmitted as tensors (multidimensional arrays), and Flow represents the use of computational graphs for computation. Data flow graphs use directed graphs consisting of "nodes" and "edges" to describe mathematical operations. "Nodes" are generally used to represent the mathematical operations applied, but can also represent the starting point of data input and the end point of output, or the end point of reading/writing persistent variables. Edges represent the input/output relationship between nodes. These data edges can transmit multidimensional data arrays with dynamically adjustable dimensions, namely tensors.
In Tensorflow, all different variables and operations are stored in the computational graph. So after we build the graph required for the model, we need to open a session to run the entire computational graph. In the session, we can distribute all calculations to the available CPU and GPU resources.
As shown above, we have built a calculation graph for addition operations. The second code block does not output the calculation result because we only defined a graph but did not run it. The third code block will output the calculation result because we need to create a session to manage all the resources of the TensorFlow runtime. However, after the calculation is completed, the session needs to be closed to help the system recycle resources, otherwise there will be resource leakage problems.
The most basic units in TensorFlow are constants, variables, and placeholders. After a constant is defined, its value and dimension are immutable, while after a variable is defined, its value is mutable but its dimension is immutable. In a neural network, variables can generally be used as matrices to store weights and other information, while constants can be used as variables to store hyperparameters or other structural information. In the calculation graph above, both nodes 1 and 2 are defined constants tf.constant(). We can declare different constants (tf.constant()) and variables (tf.Variable()), where tf.float and tf.int declare different floating-point and integer data respectively.
2. Placeholders and feed_dict
TensorFlow also supports placeholders, which have no initial value and only allocate necessary memory. In the session, placeholders can be fed data using feed_dict.
feed_dict is a dictionary, in which the value of each placeholder used needs to be given. When training a neural network, a batch of training samples needs to be provided each time. If the data selected in each iteration is represented by a constant, the computational graph of TensorFlow will be very large. Because for each additional constant, TensorFlow will add a node to the computational graph. Therefore, a neural network with millions of iterations will have an extremely large computational graph, but placeholders can solve this problem. It will only have one node, the placeholder.
3. Tensors
In TensorFlow, tensors are the basic carriers for computing graphs to perform operations. All the data we need to calculate is stored or declared in the form of tensors. As shown below, the tutorial gives the meaning of tensors of various orders.
The zero-order tensor is the familiar scalar number, which only expresses the size or nature of the quantity without any other description. The first-order tensor is the familiar vector, which not only expresses the size of the line segment, but also the direction. Generally speaking, a two-dimensional vector can represent the amount and direction of a line segment in a plane, and a three-dimensional vector and represents the amount and direction of a line segment in space. The second-order tensor is a matrix, which we can regard as a table filled with numbers. Matrix operations are operations on one table and another table. Of course, in theory we can generate tensors of any order, but in actual machine learning algorithm operations, we use the first-order tensors (vectors) and second-order tensors (matrices) the most.
Generally speaking, the data types of each element in a tensor are as follows, namely floating point type and integer type. The 32-bit floating point type is generally used in neural networks.
4. TensorFlow Machine
Throughout the tutorial, the following diagram will appear repeatedly. This is basically the construction process followed by all TensorFlow machine learning models, that is, building a computational graph, feeding input tensors, updating weights, and returning output values.
In the first step of using TensorFlow to build the computational graph, we need to build the architecture of the entire model. For example, in a neural network model, we need to build the architecture of the entire neural network starting from the input layer, including the number of hidden layers, the number of neurons in each layer, the connection between layers and weights, the activation function used by each neuron in the entire network, and so on. In addition, we also need to configure the entire training, validation, and testing process. For example, in a neural network, define the entire forward propagation process and parameters and set various training hyperparameters such as learning rate, regularization rate, and batch size. The second step is to feed training data or test data into the model. In this step, TensorFlow generally needs to open a session to perform tasks such as parameter initialization and data feeding. For example, in computer vision, we need to randomly initialize the entire model parameter values and feed images in batches (the number of images is equal to the batch size) into the defined convolutional neural network. The third step is to update the weights and obtain the return value, which is generally used to control the training process and obtain the final prediction results.
TensorFlow Model Practice
TensorFlow Linear Regression
The tutorial introduces many basic concepts of linear regression, including linear fitting, loss function, gradient descent and other basic contents. We have always believed that linear regression is the best entry model for understanding machine learning, because its principles and concepts are very simple, but basically involve all the processes of machine learning. In general, the linear regression model can be summarized as follows:
The "×" is a data point, and we need to find a straight line to best fit these data points. The distance between the straight line and these data points is the loss function, so we hope to find a straight line that minimizes the loss function. The following is a simple example of using TensorFlow to build linear regression.
1. Construct the objective function (i.e. the “straight line”)
The objective function is H(x)=Wx+b, where x is the eigenvector, W is the weight corresponding to each element in the eigenvector, and b is the bias term.
# X and Y data
x_train = [1, 2, 3]
y_train = [1, 2, 3]
W = tf.Variable(tf.random_normal([1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
# Our hypothesis XW+b
hypothesis = x_train * W + b
As shown above, we define the operation y=wx+b, which is the straight line we need to fit.
2. Constructing the loss function
Next, we need to construct the loss function of the entire model, that is, the distance from each data point to the straight line. The loss function we construct here is the mean square error function:
This function indicates the distance between the predicted value of a data point and the true value of the data point. We can implement it using the following code:
# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - y_train))
Among them, tf.square() takes the square of a number, and tf.reduce_mean() takes the mean.
3. Update weights using gradient descent
# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(cost)
In order to find the best straight line that fits the data, we need to minimize the loss function, that is, the distance between the data and the straight line, so we can use the gradient descent algorithm:
4. Run the computational graph to perform training
# Launch the graph in a session.
sess = tf.Session()
# Initializes global variables in the graph.
sess.run(tf.global_variables_initializer())
# Fit the line
for step in range(2001):
sess.run(train)
if step % 20 == 0:
print(step, sess.run(cost), sess.run(W), sess.run(b))
The above code opens a session and performs variable initialization and feeding data.
Finally, the course gives a complete implementation code. Beginners can try to implement this simple linear regression model:
import tensorflow as tf
W = tf.Variable(tf.random_normal([1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
X = tf.placeholder(tf.float32, shape=[None])
Y = tf.placeholder(tf.float32, shape=[None])
# Our hypothesis XW+b
hypothesis = X * W + b
# cost/loss function
cost = tf.reduce_mean(tf.square(hypothesis - Y))
# Minimize
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)
train = optimizer.minimize(cost)
# Launch the graph in a session.
sess = tf.Session()
# Initializes global variables in the graph.
sess.run(tf.global_variables_initializer())
# Fit the line
for step in range(2001):
cost_val, W_val, b_val, _ = sess.run([cost, W, b, train],
feed_dict={X: [1, 2, 3], Y: [1, 2, 3]})
if step % 20 == 0:
print(step, cost_val, W_val, b_val)
Let's take a look at more of the course content below.
Logistic Regression
As usual, this course first introduces the basic concepts of Logistic Regression, and shows the objective function, loss function, and weight update process as follows.
The implementation code of Logistic regression is shown later:
xy = np.loadtxt('data-03-diabetes.csv', delimiter=',', dtype=np.float32)
x_data = xy[:, 0:-1]
y_data = xy[:, [-1]]
# placeholders for a tensor that will be always fed.
X = tf.placeholder(tf.float32, shape=[None, 8])
Y = tf.placeholder(tf.float32, shape=[None, 1])
W = tf.Variable(tf.random_normal([8, 1]), name='weight')
b = tf.Variable(tf.random_normal([1]), name='bias')
# Hypothesis using sigmoid: tf.div(1., 1. + tf.exp(tf.matmul(X, W)))
hypothesis = tf.sigmoid(tf.matmul(X, W) + b)
# cost/loss function
cost = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) * tf.log(1 - hypothesis))
train = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(cost)
# Accuracy computation
# True if hypothesis>0.5 else False
predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)
accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))
# Launch graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
feed = {X: x_data, Y: y_data}
for step in range(10001):
sess.run(train, feed_dict=feed)
if step % 200 == 0:
print(step, sess.run(cost, feed_dict=feed))
# Accuracy report
h, c, a = sess.run([hypothesis, predicted, accuracy], feed_dict=feed)
print("\nHypothesis: ", h, "\nCorrect (Y): ", c, "\nAccuracy: ", a)
Softmax Classification
The figure below shows the basic method of Softmax, which can produce class probabilities that sum to 1.
The following code processes the MNIST dataset for the Softmax classifier:
# weights & bias for nn layers
W = tf.Variable(tf.random_normal([784, 10]))
b = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(X, W) + b
# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# train my model
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
feed_dict = {X: batch_xs, Y: batch_ys}
c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
avg_cost += c / total_batch
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))
print('Learning Finished!')
# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels}))
Neural Networks
The following figure briefly introduces the operation process of the neural network. This part is very detailed and is a rare resource for beginners:
Below is the code of the tutorial using neural network to solve the XOR problem. The XOR problem is a very classic task. We can understand the power of neural network from this problem:
x_data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
y_data = np.array([[0], [1], [1], [0]], dtype=np.float32)
X = tf.placeholder(tf.float32)
Y = tf.placeholder(tf.float32)
W1 = tf.Variable(tf.random_normal([2, 2]), name='weight1')
b1 = tf.Variable(tf.random_normal([2]), name='bias1')
layer1 = tf.sigmoid(tf.matmul(X, W1) + b1)
W2 = tf.Variable(tf.random_normal([2, 1]), name='weight2')
b2 = tf.Variable(tf.random_normal([1]), name='bias2')
hypothesis = tf.sigmoid(tf.matmul(layer1, W2) + b2)
# cost/loss function
cost = -tf.reduce_mean(Y * tf.log(hypothesis) + (1 - Y) * tf.log(1 - hypothesis))
train = tf.train.GradientDescentOptimizer(learning_rate=0.1).minimize(cost)
# Accuracy computation
# True if hypothesis>0.5 else False
predicted = tf.cast(hypothesis > 0.5, dtype=tf.float32)
accuracy = tf.reduce_mean(tf.cast(tf.equal(predicted, Y), dtype=tf.float32))
# Launch graph
with tf.Session() as sess:
# Initialize TensorFlow variables
sess.run(tf.global_variables_initializer())
for step in range(10001):
sess.run(train, feed_dict={X: x_data, Y: y_data})
if step % 100 == 0:
print(step, sess.run(cost, feed_dict={X: x_data, Y: y_data}), sess.run([W1, W2]))
# Accuracy report
h, c, a = sess.run([hypothesis, predicted, accuracy],
feed_dict={X: x_data, Y: y_data})
print("\nHypothesis: ", h, "\nCorrect: ", c, "\nAccuracy: ", a)
Convolutional Neural Networks
The second day of the tutorial officially entered the convolutional neural network. We can only use the following figure to show the general architecture of the convolutional neural network. For more information, please refer to the original courseware:
This tutorial also provides a lot of convolutional network implementation codes. Below we briefly introduce a simple convolutional neural network implementation process. The architecture of the convolutional neural network is as follows:
The following code creates the first convolutional layer, which is the convolutional layer 1 and pooling layer 1 in the figure above:
# input placeholders
X = tf.placeholder(tf.float32, [None, 784])
X_img = tf.reshape(X, [-1, 28, 28, 1]) # img 28x28x1 (black/white)
Y = tf.placeholder(tf.float32, [None, 10])
# L1 ImgIn shape=(?, 28, 28, 1)
W1 = tf.Variable(tf.random_normal([3, 3, 1, 32], stddev=0.01))
# Conv -> (?, 28, 28, 32)
# Pool -> (?, 14, 14, 32)
L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME')
L1 = tf.nn.relu(L1)
L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
'''
Tensor("Conv2D:0", shape=(?, 28, 28, 32), dtype=float32)
Tensor("Relu:0", shape=(?, 28, 28, 32), dtype=float32)
Tensor("MaxPool:0", shape=(?, 14, 14, 32), dtype=float32)
'''
The following code constructs the second convolutional layer, which is the convolutional layer 2 and pooling layer 2 in the figure above:
'''
Tensor("Conv2D:0", shape=(?, 28, 28, 32), dtype=float32)
Tensor("Relu:0", shape=(?, 28, 28, 32), dtype=float32)
Tensor("MaxPool:0", shape=(?, 14, 14, 32), dtype=float32)
'''
# L2 ImgIn shape=(?, 14, 14, 32)
W2 = tf.Variable(tf.random_normal([3, 3, 32, 64], stddev=0.01))
# Conv ->(?, 14, 14, 64)
# Pool ->(?, 7, 7, 64)
L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME')
L2 = tf.nn.relu(L2)
L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
L2 = tf.reshape(L2, [-1, 7 * 7 * 64])
'''
Tensor("Conv2D_1:0", shape=(?, 14, 14, 64), dtype=float32)
Tensor("Relu_1:0", shape=(?, 14, 14, 64), dtype=float32)
Tensor("MaxPool_1:0", shape=(?, 7, 7, 64), dtype=float32)
Tensor("Reshape_1:0", shape=(?, 3136), dtype=float32)
Finally, we only need to build a fully connected layer to complete the construction of the entire CNN architecture, that is, use the following code to build the purple fully connected layer in the figure above:
'''
Tensor("Conv2D_1:0", shape=(?, 14, 14, 64), dtype=float32)
Tensor("Relu_1:0", shape=(?, 14, 14, 64), dtype=float32)
Tensor("MaxPool_1:0", shape=(?, 7, 7, 64), dtype=float32)
Tensor("Reshape_1:0", shape=(?, 3136), dtype=float32)
'''
L2 = tf.reshape(L2, [-1, 7 * 7 * 64])
# Final FC 7x7x64 inputs -> 10 outputs
W3 = tf.get_variable("W3", shape=[7 * 7 * 64, 10], initializer=tf.contrib.layers.xavier_initializer())
b = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L2, W3) + b
# define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
Finally, we only need to train the CNN to complete the entire model:
# initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())
# train my model
print('Learning stared. It takes sometime.')
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(mnist.train.num_examples / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = mnist.train.next_batch(batch_size)
feed_dict = {X: batch_xs, Y: batch_ys}
c, _, = sess.run([cost, optimizer], feed_dict=feed_dict)
avg_cost += c / total_batch
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))
print('Learning Finished!')
# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Accuracy:', sess.run(accuracy, feed_dict={X: mnist.test.images, Y: mnist.test.labels}))
Recurrent Neural Networks
The third day of this tutorial talks about recurrent neural networks. The following figure shows the expansion of the recurrent unit, which is the core of processing time series data. For more detailed information, please check the courseware.
The following TensorFlow code defines a simple recurrent unit:
# One cell RNN input_dim (4) -> output_dim (2)
hidden_size = 2
cell = tf.contrib.rnn.BasicRNNCell(num_units=hidden_size)
x_data = np.array([[[1,0,0,0]]], dtype=np.float32)
outputs, _states = tf.nn.dynamic_rnn(cell, x_data, dtype=tf.float32)
sess.run(tf.global_variables_initializer())
pp.pprint(outputs.eval())
The course below shows a simple convolutional neural network example, as shown below, which trains an RNN to output "hihello".
1. Create RNN cell
As shown below, three types of RNN units can generally be created in TensorFlow, namely RNN units, LSTM units, and GRU units.
# RNN model rnn_cell = rnn_cell.BasicRNNCell(rnn_size) rnn_cell = rnn_cell.BasicLSTMCell(rnn_size) rnn_cell = rnn_cell.GRUCell(rnn_size)
2. Execute RNN
# RNN model rnn_cell = rnn_cell.BasicRNNCell(rnn_size) outputs, _states = tf.nn.dynamic_rnn( rnn_cell, X, initial_state=initial_state, dtype = tf.float32)
3. Setting the parameters of RNN
hidden_size = 5 # output from the LSTM input_dim = 5 # one-hot size batch_size = 1 # one sentence sequence_length = 6 # |ihello| == 6
4. Create data
idx2char = ['h', 'i', 'e', 'l', 'o'] # h=0, i=1, e=2, l=3, o=4x_data = [[0, 1, 0, 2, 3, 3]] # hihell x_one_hot = [[[1, 0, 0, 0, 0], # h 0 [0, 1, 0, 0, 0], # i 1 [1, 0, 0, 0, 0], # h 0 [0, 0, 1, 0, 0], # e 2 [0, 0, 0, 1, 0], # 1 3 [0, 0, 0, 1, 0]]] # l 3y_data = [[1, 0, 2, 3, 3, 4]] # ihello X = tf.placeholder(tf.float32, [None, sequence_length, input_dim]) # X one-hot Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label
5. Feeding data into RNN
X = tf.placeholder( tf.float32, [None, sequence_length, hidden_size]) # X one-hot Y = tf.placeholder(tf.int32, [None, sequence_length]) # Y label cell = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size, state_is_tuple=True) initial_state = cell.zero_state(batch_size, tf.float32) outputs, _states = tf.nn.dynamic_rnn( cell, X, initial_state=initial_state, dtype=tf.float32)
6. Create a sequence loss function
outputs, _states = tf.nn.dynamic_rnn( cell, X, initial_state=initial_state, dtype=tf.float32) weights = tf.ones([batch_size, sequence_length]) sequence_loss = tf.contrib.seq2seq.sequence_loss( logits=outputs, targets=Y, weights=weights) loss = tf.reduce_mean(sequence_loss) train = tf.train.AdamOptimizer(learning_rate=0.1).minimize(loss)
7. Training RNN
This is the last step where we will open a TensorFlow session to finish training the model.
prediction = tf.argmax(outputs, axis=2)with tf.Session() as sess: sess.run(tf.global_variables_initializer()) for i in range(2000): l, _ = sess.run([loss, train], feed_dict={X: x_one_hot, Y: y_data}) result = sess.run(prediction, feed_dict={X: x_one_hot}) print(i, "loss:", l, "prediction: ", result, "true Y: ", y_data) # print char using dic result_str = [idx2char[c] for c in np.squeeze(result)] print("\tPrediction str: ", ''.join(result_str))