#AI Challenge Camp First Stop# PyTorch implements MNIST handwritten digit recognition

maskmoo

#AI Challenge Camp First Stop# PyTorch implements MNIST handwritten digit recognition [Copy link]

This post was last edited by maskmoo on 2024-4-14 15:13

This paper uses the PyTorch framework and the CNN convolutional neural network to implement MNIST handwritten digit recognition.

1 Pytorch environment installation

conda create -n pytorch python=3.6
conda activate pytorch
conda install pytorch torchvision torchaudio cpuonly -c pytorch

If you encounter network related installation problems, you can install it offline.

Install pytorch (cpu version), detailed installation steps of torch and torchvision

Pytorch environment check

print(torch.__version__) outputs the version of torch. If the result of print(torch.cuda.is_available()) is False, it represents the cpu version.

2 Dataset (MNIST)

The MNIST dataset is a very classic dataset in the field of machine learning. You can download it from the official website http://yann.lecun.com/exdb/mnist/ or add the download=True option when loading the data later.

Data preprocessing: transforms.Compose is used to define the data preprocessing pipeline, which includes converting the image into a tensor and normalizing it so that the pixel values of the image data are between 0 and 1.

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

Data loading: The MNIST dataset in torchvision.datasets is used, the data is loaded through DataLoader, and the batch size and whether to shuffle the data are set.

train_dataset = datasets.MNIST(root='./data/', train=True, download=True, transform=transform)  
test_dataset = datasets.MNIST(root='./data/', train=False, download=True, transform=transform)  # train=True训练集，=False测试集

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

3. Building the model (CNN)

Model structure: Build a CNN model by defining a class inherited from torch.nn.Module, which includes convolutional layers, activation functions, and pooling layers.

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  
        x = self.conv2(x) 
        x = x.view(batch_size, -1) 
        x = self.fc(x)
        return x

Model training: The forward method is defined in the model class for forward propagation, and the loss function ( cross entropy loss ) and optimizer ( stochastic gradient descent ) are also defined.

criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)

Defining training and testing wheels

Training round: defines a training function, which includes the steps of forward propagation, loss calculation, backpropagation and parameter update.

def train(epoch):
    running_loss = 0.0
    running_total = 0
    running_correct = 0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data
        optimizer.zero_grad()

        # forward + backward + update
        outputs = model(inputs)
        loss = criterion(outputs, target)

        loss.backward()
        optimizer.step()

        # 把运行中的loss累加起来，为了下面300次一除
        running_loss += loss.item()
        # 把运行中的准确率acc算出来
        _, predicted = torch.max(outputs.data, dim=1)
        running_total += inputs.shape[0]
        running_correct += (predicted == target).sum().item()

        if batch_idx % 300 == 299:  # 不想要每一次都出loss，浪费时间，选择每300次出一个平均损失,和准确率
            print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
                  % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
            running_loss = 0.0  # 这小批300的loss清零
            running_total = 0
            running_correct = 0  # 这小批300的acc清零

Test round: defines a test function to evaluate the performance of the model on the test set.

def test():
    correct = 0
    total = 0
    with torch.no_grad():  # 测试集不用算梯度
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1)  # dim = 1 列是第0个维度，行是第1个维度，沿着行(第1个维度)去找1.最大值和2.最大值的下标
            total += labels.size(0)  # 张量之间的比较运算
            correct += (predicted == labels).sum().item()
    acc = correct / total
    print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc))  # 求测试的准确率，正确数/总数
    return acc

Start training

Hyperparameter settings: Set hyperparameters such as batch size, learning rate, and momentum. The choice of these hyperparameters has an important impact on the training effect of the model
Training process: Use a loop to iterate multiple rounds for training, and perform a test after each round of training.

image.png (34.89 KB, downloads: 0)

download attach save to album

2024-4-14 13:35 上传

Save Parameters

        torch.save(model.state_dict(), './model_mnist.pth')
        torch.save(optimizer.state_dict(), './optimizer_mnist.pth')

Both model.state_dict() and optimizer.state_dict() are functions used in PyTorch to save and load model states, but they save different contents:

model.state_dict() :
- model.state_dict() returns a Python dictionary containing all the parameters of the model (weights and biases), as well as the names of each layer and the tensor values corresponding to the parameters.
- Typically used to save and load the parameter state of a model. This dictionary can be saved to a file and loaded into the model when needed using the load_state_dict() method.
optimizer.state_dict() :
- optimizer.state_dict() returns a Python dictionary containing information about the optimizer’s state, including the current parameters and momentum cache.
- Typically used to save and load the state of an optimizer so that training can be resumed after an interruption.

It is common to save both the model's state and the optimizer's state so that both the model's state and the training state can be fully restored when needed without having to restart training.

4 ONNX Conversion

Define the model : First, make sure you have defined the PyTorch model to export and loaded its parameters.
Prepare Input : Prepare an example input to be used to infer the input shape of the model during the export process.
Export model : Use torch.onnx.export() function to export the model. You need to specify the model, sample input, export file path and other parameters.

    #导出为onnx模型
    input = torch.randn(1, 28, 28)

    torch.onnx.export(model, input, "mnist.onnx", verbose=True)

ONNX View

Netron is a lightweight ONNX model viewer that runs in your browser. Install Netron using pip:

pip install netron

You can then start Netron and open the ONNX file using the following command:

netron <your_model.onnx>

5 Complete code

The following code is used to train the MNIST handwritten digit recognition model and output the loss and accuracy of the training round during the training process. The program includes the following main parts

Import necessary libraries : Libraries such as PyTorch, NumPy, Matplotlib, and DataLoader are imported, as well as the MNIST dataset and some conversion operations.
Hyperparameter settings : Hyperparameters such as batch size, learning rate, momentum, and training rounds are defined.
Data preparation : defines transformations and loaders for the MNIST dataset.
Model design : A simple convolutional neural network model was built using torch.nn.Sequential, which includes two convolutional layers and two fully connected layers.
Loss Function and Optimizer : The cross entropy loss function and stochastic gradient descent (SGD) optimizer are defined.
Training and testing functions : train and test functions are defined for training and testing the model. During the training process, the data batches in the data loader are iterated to calculate the loss, update the model parameters, and output the loss and accuracy of the training round. During the testing process, the data batches in the test data loader are iterated to evaluate the accuracy of the model on the test set.
Command line argument parsing : Use the argparse module to parse command line arguments so that training rounds can be specified in the command line.
Training and testing process : In the if __name__ == '__main__': block, load the command line parameters and start training and testing the model. After each training round, check whether the test accuracy is optimal. If so, save the model parameters. Finally, plot the test accuracy versus training round.
Export ONNX model : After training is complete, load the best performing model parameters and export it as a model file in ONNX format.

import torch
import numpy as np
from matplotlib import pyplot as plt
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision import datasets
import torch.nn.functional as F
import argparse

# OMP: Error #15
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

"""
卷积运算 使用mnist数据集，和10-4，11类似的，只是这里：1.输出训练轮的acc 2.模型上使用torch.nn.Sequential
"""
# Super parameter ------------------------------------------------------------------------------------
batch_size = 64
learning_rate = 0.01
momentum = 0.5
EPOCH = 10

# Prepare dataset ------------------------------------------------------------------------------------
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# softmax归一化指数函数(https://blog.csdn.net/lz_peter/article/details/84574716),其中0.1307是mean均值和0.3081是std标准差

train_dataset = datasets.MNIST(root='./data/mnist', train=True, transform=transform, download=True)  # 本地没有就加上download=True
test_dataset = datasets.MNIST(root='./data/mnist', train=False, transform=transform)  # train=True训练集，=False测试集
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

# fig = plt.figure()
# for i in range(12):
#     plt.subplot(3, 4, i+1)
#     plt.tight_layout()
#     plt.imshow(train_dataset.train_data, cmap='gray', interpolation='none')
#     plt.title("Labels: {}".format(train_dataset.train_labels))
#     plt.xticks([])
#     plt.yticks([])
# plt.show()


# 训练集乱序，测试集有序
# Design model using class ------------------------------------------------------------------------------
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = torch.nn.Sequential(
            torch.nn.Conv2d(1, 10, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.conv2 = torch.nn.Sequential(
            torch.nn.Conv2d(10, 20, kernel_size=5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(kernel_size=2),
        )
        self.fc = torch.nn.Sequential(
            torch.nn.Linear(320, 50),
            torch.nn.Linear(50, 10),
        )

    def forward(self, x):
        batch_size = x.size(0)
        x = self.conv1(x)  # 一层卷积层,一层池化层,一层激活层(图是先卷积后激活再池化，差别不大)
        x = self.conv2(x)  # 再来一次
        x = x.view(batch_size, -1)  # flatten 变成全连接网络需要的输入 (batch, 20,4,4) ==> (batch,320), -1 此处自动算出的是320
        x = self.fc(x)
        return x  # 最后输出的是维度为10的，也就是（对应数学符号的0~9）


model = Net()


# Construct loss and optimizer ------------------------------------------------------------------------------
criterion = torch.nn.CrossEntropyLoss()  # 交叉熵损失
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum)  # lr学习率，momentum冲量




# Train and Test CLASS --------------------------------------------------------------------------------------
# 把单独的一轮一环封装在函数类里
def train(epoch):
    running_loss = 0.0  # 这整个epoch的loss清零
    running_total = 0
    running_correct = 0
    for batch_idx, data in enumerate(train_loader, 0):
        inputs, target = data
        optimizer.zero_grad()

        # forward + backward + update
        outputs = model(inputs)
        loss = criterion(outputs, target)

        loss.backward()
        optimizer.step()

        # 把运行中的loss累加起来，为了下面300次一除
        running_loss += loss.item()
        # 把运行中的准确率acc算出来
        _, predicted = torch.max(outputs.data, dim=1)
        running_total += inputs.shape[0]
        running_correct += (predicted == target).sum().item()

        if batch_idx % 300 == 299:  # 不想要每一次都出loss，浪费时间，选择每300次出一个平均损失,和准确率
            print('[%d, %5d]: loss: %.3f , acc: %.2f %%'
                  % (epoch + 1, batch_idx + 1, running_loss / 300, 100 * running_correct / running_total))
            running_loss = 0.0  # 这小批300的loss清零
            running_total = 0
            running_correct = 0  # 这小批300的acc清零

        # torch.save(optimizer.state_dict(), './optimizer_mnist.pth')


def test():
    correct = 0
    total = 0
    with torch.no_grad():  # 测试集不用算梯度
        for data in test_loader:
            images, labels = data
            outputs = model(images)
            _, predicted = torch.max(outputs.data, dim=1)  # dim = 1 列是第0个维度，行是第1个维度，沿着行(第1个维度)去找1.最大值和2.最大值的下标
            total += labels.size(0)  # 张量之间的比较运算
            correct += (predicted == labels).sum().item()
    acc = correct / total
    print('[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch+1, EPOCH, 100 * acc))  # 求测试的准确率，正确数/总数
    return acc


# Start train and Test --------------------------------------------------------------------------------------

if __name__ == '__main__':
    # 创建 ArgumentParser 对象
    parser = argparse.ArgumentParser(description='Train a model with specified number of epochs.')

    # 添加 --epochs 参数
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')

    # 解析命令行参数
    args = parser.parse_args()

    # 使用 args.epochs 获取命令行传入的值
    EPOCH = epochs = args.epochs

    # 打印参数值
    print(f'Number of epochs to train: {epochs}')
    acc_list_test = []
    acc_best = 0
    epoch_bset = 0
    for epoch in range(EPOCH):
        train(epoch)
        # if epoch % 10 == 9:  #每训练10轮 测试1次
        acc_test = test()
        if acc_test > acc_best:
            acc_best = acc_test
            epoch_bset = epoch
            torch.save(model.state_dict(), './model_mnist_best.pth')
        acc_list_test.append(acc_test)
    print('Bset:[%d / %d]: Accuracy on test set: %.1f %% ' % (epoch_bset+1, EPOCH, 100 * acc_best))  # 求测试的准确率，正确数/总数

    plt.plot(acc_list_test)
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy On TestSet')
    plt.show()
    
    #导出为onnx模型
    model.load_state_dict(torch.load('model_mnist_best.pth'))
    input = torch.randn(1, 28, 28)
    torch.onnx.export(model, input, "mnist.onnx", verbose=True)

6 Reference links

A comprehensive summary of the differences between pip install and conda install_Differences between installing pip and conda in pytorch

(Detailed explanation) Install pytorch (cpu version), detailed installation steps of torch and torchvision, and configure pytorch in jupyter notebook and pycharm_How to install the matching torchvision

Implementing MNIST handwritten digit recognition with PyTorch