【DigiKey Creative Contest】Rock, Scissors, Paper: AI Era + No.6. Work Submission Post
[Copy link]
This post was last edited by a media student on 2024-1-7 11:48
Rock, Paper, Scissors: The AI Era
-- Author: Media student
The document formats are slightly different, you can download the pdf or doc version directly.
1. Introduction
With the development of technology, AI applications have gradually been deployed in embedded hardware platforms. Currently, some embedded hardware platforms already have the ability of real-time reasoning. Raspberry Pi hardware platform is a typical representative.
The name of this project is Rock, Scissors, Paper: AI Era . It uses a Raspberry Pi 4B (4GB) purchased from Digi-Key Electronics Mall to deploy the yolofastestv2 algorithm, collect gestures through the Raspberry Pi second-generation camera, train using the host, transplant the yolofastestv2 inference algorithm model network and weight data into the Raspberry Pi, perform inference, and recognize the three gestures of rock, scissors, and paper in real time. After the game starts, a gesture is randomly generated, and then the player's gesture is recognized and the results are compared.
Works photos & hardware photos:
Work renderings:
2. System Block Diagram
This project mainly trains the AI model on the PC side, and then imports the generated network model weight parameters and other information into the Raspberry Pi, using the ncnn framework to achieve real-time collection and detection of gestures.
Therefore, I chose the Raspberry Pi 4B equipped with Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.8GHz CPU and onboard LPDDR4-3200 SDRAM as the hardware platform for deploying inference. Because deep learning neural networks still require a certain amount of memory, 4GB of memory is selected here. If the budget is sufficient, it is recommended to directly use 8GB of DDR; In addition, we also need to use the camera to collect images in real time. At the beginning, we need to use the camera to collect images for training on the PC side. Later, the Raspberry Pi also needs to use the camera to collect user gesture images in real time. In fact, Gstream and opencv are used to obtain video streams and capture images. The main design principle is shown in Figure 1.
Figure 1 System Block Diagram
The purple part in Figure 1 was conducted on a PC desktop, using PyCharm and Cuda to use RTX4060Ti for model training. The overall training time was about 30 minutes. Here, it is mentioned that the PC's memory should be as large as possible, otherwise the batch size cannot be set too large. I have 16GB of memory, batch size = 4, and training 300 rounds, which takes about 30 minutes. The specific process will be described in detail in Chapter 3. The yellow part in Figure 1 is to put the model parameters generated by the training into the Raspberry Pi NCNN framework for real-time inference. C++ is faster than python because python is an interpreted language, so with the help of NCNN, a higher real-time inference frame rate can be obtained.
Figure 2 Schematic diagram of software and hardware
Figure 2 is a block diagram of the software and hardware for performing inference on the Raspberry Pi, from video acquisition to the final display of the results on the display. The Raspberry Pi NoIR Camera V2 camera module uses the Sony IMX219 8-megapixel image sensor, which has a high quality of captured images and can meet the needs of training and inference.
Because the project is to be able to play the game of rock-paper-scissors with AI, the execution process of the entire game on the Raspberry Pi is shown in Figure 3. Each round of the game is divided into three types: lost, win, and tied, which respectively mean the user loses, the user wins, and the game is a draw.
Figure 3 Schematic diagram of the game process of rock-paper-scissors
3. Functional description of each part (combined with pictures and text)
3.1 Raspberry Pi hardware and software preparation
3.1.1 Raspberry Pi image installation
AI reasoning requires ncnn, opencv and other software. If you try to install them yourself, it will be very time-consuming and may not be successful. It is recommended to install them directly according to the following tutorial, which can save a lot of time;
Raspberry Pi pre-installed AI common software package image:
This project selected Bullseye 64bit OS. Because AI model reasoning consumes more CPU resources, it is recommended to use a 64bit OS. Considering the compatibility of some software, it is best to use a more stable OS version.
Figure 4 Raspberry Pi Bulls Eye 64bit OS
3.1.2 Software and hardware debugging and preparation
The main thing to prepare for the Raspberry Pi software is to confirm that NCNN/Opencv and the camera can capture images normally; the computer needs to download the cuda driver, etc.
The following is the method I figured out after going through all kinds of twists and turns: First download pycharm and Anaconda on WIN11, set the Anaconda environment variables, then create an env, open pycharm in this env, pull , and then execute pip3 install -r requirements.txt
Most of the required software can be installed here, but pytorch and cuda need to be installed by yourself as follows. cuda needs to correspond to your graphics card and pytorch.
Through the above debugging, the image acquisition and the establishment of the local environment of the PC can be basically completed.
3.2 Dataset Collection and Processing
To collect images on the Raspberry Pi, you can use the script take_photo.py. Execute this script to generate an images directory in the local directory, and the corresponding images can be taken to this directory:
After collecting the images, you need to use Labelmee software to annotate the data, because the YOLO algorithm needs to set the annotations in advance to better train. The annotated images need to be further classified into training sets and test sets. You can set 90% of the images as training sets and 10% of the images as test sets.
This action is performed through the two scripts dic_lab.py and labelmetoyolo.py; then the paths of the dataset and training set need to be generated, which can be completed through gen_train_txt.py and gen_val_txt.py. The scripts here and the datasets before and after processing are all under the pycharm project, Yolo-FastestV2/data_my.
3.3 Model training (PC training)
For details, please refer to Yolo_FastestV2_train in the project completion attachment, and set the appropriate batch_size according to the PC memory situation. It is divided into the following steps:
- Generate anchor box parameters: python genanchors.py --traintxt .\ data_my\train.txt
- Fill the generated parameters into game.data
- Train: python train.py --data data_my/game.data
- Evaluation: python evaluation.py --data dataset_my/game.data --weights .\weights\*.pth
- Test: python test.py --data dataset_my/game.data --weights .\weights\*.pth --img .\dataset_my\test3.jpg
- Convert onnx: python pytorch2onnx.py --data data set _my/game.data --weights .\weights\ * .pth --output yolo-fastestv2.onnx
- onnx-sim: python -m onnxsim yolo-fastestv2.onnx yolo-fastestv2-opt.onnx
Note that when using the attached project, you need to pay special attention to regenerating the training set, because the paths in it are absolute paths, which can be regenerated by modifying the corresponding generation script.
3.4 Model deployment (Raspberry Pi deployment)
If you have installed the OS according to the instructions above, go to the directory: ~/ncnn/build/tools/ and execute gen_yolofastestv2.sh to automatically generate
Put the yolo-fastestv2-opt.bin and yolo-fastestv2-opt.param files into the ~/software/YoloFastest_V2/ directory, and then execute codeblocks on the mobaxterm on the PC to compile the ncnn project and then execute it;
The AI_Gesture_game_source_code_for_raspi_4B.tar.gz in the final package is the Raspberry Pi codeblocks file project. The overall flow chart of the project is as follows:
Figure 3 Program flow chart
The software flow chart of the whole project is shown above. After starting, press the s key to start the game, and press the q key to exit the game. The AI gesture uses a changing seed to generate a random 0/1/2 to ensure fairness. The user's gesture is judged by AI through real-time detection and real-time recognition. The final corresponding result will be printed on the screen. For specific effect diagrams, please refer to the pictures in Chapter 1, or watch the video in Chapter 5.
The neural network diagram is as follows (see the work package yolo-fastestv2-opt.onnx.png for the complete network diagram):
…
4. Source Code
Raspberry Pi image (with opencv/ncnn and other necessary software installed):
Source code link:
Contains the following source code, data sets and corresponding scripts;
- PC training source code (including 818 chapters of annotated gesture images): Yolo_FastestV2_train.part1.rar & Yolo_FastestV2_train.part2.rar
- Raspberry Pi ncnn deployment source code: AI_Gesture_game_source_code_for_raspi_4B.tar.gz
5. Demonstration video of the work’s functions
The demonstration video includes four main parts: data collection, data annotation, data training and result testing. The video link is as follows:
6. Project Summary
After more than a month of hard work, I finally experienced the feeling of "after many twists and turns, I found a new way". Because AI may sound old-fashioned to most embedded engineers, but it is new to use, and it is also for me. At the beginning, I felt very confused. I had no idea where to start in every step of setting up the training environment and the NCNN inference environment. In the end, I finally figured out a complete AI engineering journey from data collection, data labeling, Pytorch training to Raspberry Pi NCNN inference. From the perspective of game experience, it feels like a relatively complete game. Unfortunately, although the real-time inference capability of this project can reach 20FPS nominally, it may be below 10FPS in actual experience. It may be that the conversion from video stream to image stream takes a long time. After the project is completed, it is necessary to study the bottlenecks in the system carefully so as to make deeper optimization.
Post sharing link summary:
【DigiKey Creative Contest】Rock, Scissors, Paper: AI Era + No.6. Final Post
VII. Acknowledgements
At the end of the project, I would like to thank the forum for hosting this competition and Digi-Key for sponsoring this competition, which gave me the opportunity to make up my mind to explore the secrets of AI. At the same time, my wife also gave me a lot of help in data set collection and data annotation. Annotating thousands of images is a time-consuming and boring task. Finally, I hope that my work can bring you a different perspective: equip embedded systems with AI wings, change the world with AI, and lead the future with innovation.
|