#AI挑战营末站# Based on RV1106, handwritten digit recognition deployment is realized through camera acquisition

硬核王同学

#AI挑战营末站# Based on RV1106, handwritten digit recognition deployment is realized through camera acquisition [Copy link]

This post was last edited by Hardcore Wang on 2024-6-6 10:41

I didn't get this deployment identification right before, and I always got an error when deploying the environment. Finally, I found that the hard disk space was insufficient! And I directly expanded the capacity in the VM, which caused data corruption. Ubuntu couldn't start directly. I really didn't have time to write a program to run the model. This time, I followed the forum master first!

The environment deployment is very detailed, and novices can just follow me.

1. Getting to know the development board

This is LuckFox Pico Max development board, based on Rockchip RV1106 chip .

The development board I received looks like this: a board, an SC3336 camera, an RTC battery, and the Dupont cable at the back was made by myself for serial port transmission and reception.

Note that the camera socket must be connected in reverse, pull up the interface, with the blue side of the cable facing the network port, and then press it down.

I won’t go into detail about the product introduction, you can check it out on the official website: https://wiki.luckfox.com/zh/Luckfox-Pico/Luckfox-Pico-quick-start

2. Image Burning

LuckFox Pico Max comes with SPI NAND FLASH and is equipped with a test system when it leaves the factory. However, we also need to learn how to burn the image so that if there is any problem, we can check it in detail.

1) Environmental preparation

Install the driver and download RK Driver Assistant DriverAssitant ( download from the official website ). Open RK Driver Assistant DriverAssitant to install the USB driver. No connection is required during this process. Restart the computer after the installation is complete.
Image download ( download from network disk ), too large 10G, choose the dedicated luckfox_pico_pro_max_image download. Specially provide users with SD card firmware for LuckFox Pico and LuckFox Pico Mini A, and SPI FLASH firmware for LuckFox Pico Mini B and LuckFox Pico Plus/Pro/Max.
Image burning , download and unzip the burning tool ( go to the official website to download ).

2) Detailed steps for burning

After connecting to the computer while holding down the BOOT button, release the BOOT button and the Rockchip Flash Tool will display the MaskRom device.
Load the firmware storage directory, reload the env file, and check all items.
click to download.

That's how it worked.

3) Check the environment (this step can be skipped, it is usually correct)

If the burning is successful, the development board will flash a red light, and you can use SSH, Telnet login, serial port debugging, ADB login, and file transfer. I won't go into details here. I am used to using ADB, so I usually use ADB. For details, see this: https://wiki.luckfox.com/zh/Luckfox-Pico/SSH-Telnet-Login

Also check the camera. You need to configure the network card address to 172.32.0.100. For details, see this: https://wiki.luckfox.com/zh/Luckfox-Pico/CSI-Camera

3. SDK environment deployment (PC side)

Open Ubuntu 22.04. This environment installation was discussed in the first chapter.

1) Build a compilation environment

Install the dependent environment:

sudo apt update

sudo apt-get install -y git ssh make gcc gcc-multilib g++-multilib module-assistant expect g++ gawk texinfo libssl-dev bison flex fakeroot cmake unzip gperf autoconf device-tree-compiler libncurses5-dev pkg-config bc python-is-python3 passwd openssl openssh-server openssh-client vim file cpio rsync

Get the latest SDK:

git clone https://gitee.com/LuckfoxTECH/luckfox-pico.git

git clone https://gitee.com/LuckfoxTECH/luckfox-pico.git

2) SDK directory description

SDK Directory Structure

├── build.sh - project/build.sh ---- SDK编译脚本
├── media --------------------------- 多媒体编解码、ISP等算法相关（可独立SDK编译）
├── sysdrv -------------------------- U-Boot、kernel、rootfs目录（可独立SDK编译）
├── project ------------------------- 参考应用、编译配置以及脚本目录
├── output -------------------------- SDK编译后镜像文件存放目录
└── tools --------------------------- 烧录镜像打包工具以及烧录工具

Image storage directory

output/
├── image
│   ├── download.bin ---------------- 烧录工具升级通讯的设备端程序，只会下载到板子内存
│   ├── env.img --------------------- 包含分区表和启动参数
│   ├── uboot.img ------------------- uboot镜像
│   ├── idblock.img ----------------- loader镜像
│   ├── boot.img -------------------- kernel镜像
│   ├── rootfs.img ------------------ kernel镜像
│   └── userdata.img ---------------- userdata镜像
└── out
    ├── app_out --------------------- 参考应用编译后的文件
    ├── media_out ------------------- media相关编译后的文件
    ├── rootfs_xxx ------------------ 文件系统打包目录
    ├── S20linkmount ---------------- 分区挂载脚本
    ├── sysdrv_out ------------------ sysdrv编译后的文件
    └── userdata -------------------- userdata

3) Compile the image file

Creating an image is not a necessary operation. However, downloading the luckfox-pico source code environment package is a must, as many of the following tools depend on it.

Install the cross-compilation toolchain

cd tools/linux/toolchain/arm-rockchip830-linux-uclibcgnueabihf/ source env_install_toolchain.sh

Compile

cd luckfox-pico
./build.sh lanch   //我烧录进SPI_flash  ,选择8
./build.sh allsave

I don't know if there are any who have encountered the same unsuccessful compilation as I did. Later I checked and found that there was no space left. I deleted some things and it worked.







image.png (252.28 KB, downloads: 0)

download attach

 save to album



2024-5-29 21:47 上传

./build.sh clean 
./build.sh lanch 
./build.sh allsave

The first compilation takes some time and requires several hours of waiting. Moreover, the space required for compilation is very large, and my 40G disk cannot accommodate it at all. As a result, Ubuntu crashes directly and I have to reinstall the system. It is so painful!!!

After successful compilation, the generated firmware is stored in the SDK directory/output/image directory. As usual, you can directly use this directory to burn the image.

4. Run the RKNN model on the development board

1) Test the official examples

First use the official example to test whether you can use opencv-mobile to add the camera to collect images, add fps in the upper left corner and push the rtsp stream.

Create a folder and clone it. (You need to bypass the firewall. If you can't bypass it, you can find Code Cloud by yourself, or clone other big guys')

mkdir workspace
cd workspace/
git clone https://github.com/luckfox-eng29/luckfox_pico_rtsp_opencv.git

Compile and use absolute path!

cd luckfox_pico_rtsp_opencv/

//export LUCKFOX_SDK_PATH=<Your Luckfox-pico Sdk Path>
export LUCKFOX_SDK_PATH=/home/linux/Desktop/luckfox-pico
mkdir build
cd build
cmake ..
make && make install

Upload the compiled luckfox_rtsp_opencv_demo/luckfox_rtsp_opencv and lib directories to the development board. I use Smaba+ADB here.

One data cable is enough to connect and transfer files, which is very convenient.

adb shell
ls
mkdir work
exit

adb push \\192.168.44.129\share\Desktop\workspace\luckfox_pico_rtsp_opencv\luckfox_rtsp_opencv_demo /work
adb push \\192.168.44.129\share\Desktop\workspace\luckfox_pico_rtsp_opencv\lib /work/luckfox_rtsp_opencv_demo

Log in to the board and first stop the default rtsp, which is used to transmit the video stream.

adb shell

RkLunch-stop.sh

You need to change the permissions before you can run it. luckfox_rtsp_opencv is a Linux executable file.

chmod 777 luckfox_rtsp_opencv 
./luckfox_rtsp_opencv

Finally, you can use VLC to stream and watch the video, but sometimes the network card of the board does not turn on automatically, so you can restart it.

2) Test the example of the boss

At this point, we can try to use the RKNN model generated in the previous chapter for inference testing, and the model generation process will not be described in detail.

Because the recognition rate of the digital recognition model running with the official test code will be slightly lower, so first try to run it with the modified code of the big guys.

Original post by the boss: https://en.eeworld.com/bbs/thread-1282745-1-1.html

First clone the big guy's code

git clone https://gitee.com/luyism/luckfox_rtsp_mnist.git

After cloning, there will be a luckfox_rtsp_mnist file. Go in and take a look.

linux@linux-virtual-machine:~/Desktop/workspace$ cd luckfox_rtsp_mnist/

linux@linux-virtual-machine:~/Desktop/workspace/luckfox_rtsp_mnist$ ls

3rdparty common lib README_CN.md

build image_show.png luckfox_rtsp_mnist_dir README.md

CMakeLists.txt include model

Like the official test code, it can be compiled and uploaded to the development board for running. The luckfox_rtsp_mnist_dir folder here is a complete file that can be run directly. Let's first use the big brother's model to test the recognition rate and see the effect

cd luckfox_rtsp_mnist/

//export LUCKFOX_SDK_PATH=<Your Luckfox-pico Sdk Path>
export LUCKFOX_SDK_PATH=/home/linux/Desktop/luckfox-pico
mkdir build
cd build
cmake ..
make && make install

Upload the file to the board in advance

adb push \\192.168.44.129\share\Desktop\workspace\luckfox_rtsp_mnist\luckfox_rtsp_mnist_dir /work

First stop the default rtsp, which is used to transmit video streams.

adb shell

RkLunch-stop.sh

Also change the permissions first, then enter the following to run. Among them, luckfox_rtsp_mnist is the executable file, model.rknn is the corresponding model

 chmod 777 luckfox_rtsp_mnist
./luckfox_rtsp_mnist ./model/model.rknn

The handwritten 3 here cannot be recognized, but 8 is OK, so friends should make the line graph thicker when recognizing.

3) Run the RKNN model we generated ourselves

Here we start using the model we generated ourselves, which can be compared with the model of the big guys.

Directly on the development board, copy a copy of the boss's project file for our own testing.

cp luckfox_rtsp_mnist_dir/ luckfox_mnist_test/ -r
cd luckfox_mnist_test/

Run directly with our own mnist model

./luckfox_rtsp_mnist ./model/mnist.rknn

From my running, I can see that the model initialization should have failed. I checked other people's model running and found that it should be my own model that has a problem, so I went to regenerate the model. I will omit another 10,000 words here...

Take a photo with the original model author

5. Code Detail

1) Official examples

1. Code framework

3rdparty : This directory is usually used to store third-party libraries or dependencies. Here, it contains three subdirectories: allocator, librga, and rknpu2.
build : This directory is used to store files generated during the build (compilation) process.
common : This directory contains common code, libraries, or resources that are used in multiple places in the project.
include : This directory is usually used to store header files used in the project.
lib : This directory is usually used to store library files (such as .a or .so files).
luckfox_rtsp_opencv_demo : This directory contains code or resources related to project demonstrations or examples.
src : This directory usually contains source code files.

So we just need to focus on the src file. Enter the src folder, there are only two files luckfox_mpi.cc main.cc.

File 1: luckfox_mpi.cc

This file contains some functions related to Rockchip Multimedia Processing Interface (MPI). These functions are mainly used to initialize and configure the video input (VI), video processing subsystem (VPSS) and video encoding (VENC) modules.

TEST_COMM_GetNowUs(): This function is used to get the current timestamp in microseconds.
vi_dev_init(): Initializes the video input device. It first checks if the device is configured, and if not, configures it. Then it checks if the device is enabled, and if not, enables it and binds it to a specific pipe.
vi_chn_init(): Initialize the video input channel, set the number of buffers, memory type, resolution, pixel format, etc.
vpss_init(): Initializes the video processing subsystem (VPSS), sets channel properties such as mode, dynamic range, pixel format, resolution, etc., and starts the VPSS group.
venc_init(): Initializes the video encoder, sets the encoding type, pixel format, profile, resolution, frame width, frame height, number of buffers, bit rate, etc., and starts receiving frames.

File 2: main.cc

This file is the main entry point of the program, which uses the OpenCV library to display video frames and Rockchip's MPI to process the video stream.

Contains a series of standard C libraries and Rockchip-specific header files, as well as OpenCV header files.
The main function first initializes the ISP (Image Signal Processor), MPI system, RTSP session, and sets up the video input and VPSS.
Use the vi_dev_init and vi_chn_init functions to initialize the video input device and channel.
Use the vpss_init function to initialize VPSS and bind the VI to VPSS.
Use the venc_init function to initialize the video encoder.
In an infinite loop, the program acquires frames from VPSS, displays them using OpenCV, and calculates the frame rate.
Encode H264 video stream and send it through RTSP session.
After the loop is finished, the program releases the resources and exits.

These two files work together to implement a video capture, processing, and encoding process, and stream video data through the RTSP protocol.

2. Code analysis

We only need to understand the main.cc file here. There are only 174 lines in total, and we only need to pay attention to the following key parts:

/*****************************************************************************
* | Author      :   Luckfox team
* | Function    :   
* | Info        :
*
*----------------
* | This version:   V1.0
* | Date        :   2024-04-07
* | Info        :   Basic version
*
******************************************************************************/

/*头文件包含: 包含了标准C库头文件、Rockchip平台专用的头文件、OpenCV库的头文件。*/
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <pthread.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/poll.h>
#include <time.h>
#include <unistd.h>
#include <vector>

#include "rtsp_demo.h"
#include "luckfox_mpi.h"

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

int main(int argc, char *argv[]) {
        /*
        全局变量定义:
        s32Ret: 用于存储函数调用的返回值。
        width 和 height: 视频帧的宽度和高度。
        fps_text 和 fps: 用于显示帧率。
        stFrame 和 h264_frame: 用于存储编码后的视频流数据和视频帧信息。
        */
        RK_S32 s32Ret = 0; 

        int sX,sY,eX,eY;
        int width    = 2304;
        int height   = 1296;

        char fps_text[16];
        float fps = 0;
        memset(fps_text,0,16);

        //h264_frame        
        VENC_STREAM_S stFrame;        
        stFrame.pstPack = (VENC_PACK_S *)malloc(sizeof(VENC_PACK_S));
         VIDEO_FRAME_INFO_S h264_frame;
         VIDEO_FRAME_INFO_S stVpssFrame;


        /*
        初始化ISP: SAMPLE_COMM_ISP_Init 和 SAMPLE_COMM_ISP_Run 用于初始化和运行图像信号处理器。
        */
        // rkaiq init
        RK_BOOL multi_sensor = RK_FALSE;        
        const char *iq_dir = "/etc/iqfiles";
        rk_aiq_working_mode_t hdr_mode = RK_AIQ_WORKING_MODE_NORMAL;
        //hdr_mode = RK_AIQ_WORKING_MODE_ISP_HDR2;
        SAMPLE_COMM_ISP_Init(0, hdr_mode, multi_sensor, iq_dir);
        SAMPLE_COMM_ISP_Run(0);

        /*
        初始化MPI系统: RK_MPI_SYS_Init 初始化Rockchip的多媒体处理接口。
        */
        // rkmpi init
        if (RK_MPI_SYS_Init() != RK_SUCCESS) {
                RK_LOGE("rk mpi sys init fail!");
                return -1;
        }

        /*
        RTSP会话初始化: 创建RTSP演示处理程序和会话，设置视频编码格式，并同步时间戳。
        */
        // rtsp init        
        rtsp_demo_handle g_rtsplive = NULL;
        rtsp_session_handle g_rtsp_session;
        g_rtsplive = create_rtsp_demo(554);
        g_rtsp_session = rtsp_new_session(g_rtsplive, "/live/0");
        rtsp_set_video(g_rtsp_session, RTSP_CODEC_ID_VIDEO_H264, NULL, 0);
        rtsp_sync_video_ts(g_rtsp_session, rtsp_get_reltime(), rtsp_get_ntptime());

        /*
        视频输入设备初始化: vi_dev_init 和 vi_chn_init 初始化视频输入设备和通道。
        */
        // vi init
        vi_dev_init();
        vi_chn_init(0, width, height);


        /*
        VPSS初始化: vpss_init 初始化视频处理子系统。
        */
        // vpss init
        vpss_init(0, width, height);

        /*
        绑定VI到VPSS: 使用RK_MPI_SYS_Bind将视频输入绑定到视频处理子系统。
        */
        // bind vi to vpss
        MPP_CHN_S stSrcChn, stvpssChn;
        stSrcChn.enModId = RK_ID_VI;
        stSrcChn.s32DevId = 0;
        stSrcChn.s32ChnId = 0;

        stvpssChn.enModId = RK_ID_VPSS;
        stvpssChn.s32DevId = 0;
        stvpssChn.s32ChnId = 0;
        printf("====RK_MPI_SYS_Bind vi0 to vpss0====\n");
        s32Ret = RK_MPI_SYS_Bind(&stSrcChn, &stvpssChn);
        if (s32Ret != RK_SUCCESS) {
                RK_LOGE("bind 0 ch venc failed");
                return -1;
        }

        /*
        视频编码器初始化: venc_init 初始化视频编码器，设置编码参数。
        */
        // venc init
        RK_CODEC_ID_E enCodecType = RK_VIDEO_ID_AVC;
        venc_init(0, width, height, enCodecType);
        
        while(1)
        {        
                
                /*
                获取VPSS帧：从VPSS中获取一个视频帧。0,0指定了组和通道的ID，&stVpssFrame是存储获取到帧的变量，-1是超时时间，表示无限等待直到获取帧。
                */
                // get vpss frame
                s32Ret = RK_MPI_VPSS_GetChnFrame(0,0, &stVpssFrame,-1);
                
                /*
                处理和显示帧：将获取到的视频帧的内存块地址转换为虚拟地址，然后使用OpenCV创建一个Mat对象，用于处理和显示。这里，将帧率信息显示在帧上，并更新帧的内容（例如，将处理后的图像数据复制回帧的内存空间）。
                */
                if(s32Ret == RK_SUCCESS)
                {
                        void *data = RK_MPI_MB_Handle2VirAddr(stVpssFrame.stVFrame.pMbBlk);

                        cv::Mat frame(height,width,CV_8UC3, data);        
                        sprintf(fps_text,"fps = %.2f",fps);                
            cv::putText(frame,fps_text,
                                                        cv::Point(40, 40),
                                                        cv::FONT_HERSHEY_SIMPLEX,1,
                                                        cv::Scalar(0,255,0),2);
                        memcpy(data, frame.data, width * height * 3);                                        
                }

                /*
                编码H264视频帧：将处理后的帧发送给视频编码器进行编码。
                */
                // send stream
                // encode H264        
                RK_MPI_VENC_SendFrame(0, &stVpssFrame,-1);
                
                /*
                获取编码后的视频流：从视频编码器获取编码后的视频流
                */
                // rtsp
                s32Ret = RK_MPI_VENC_GetStream(0, &stFrame, -1);
                
                /*
                RTSP传输：如果已经初始化了RTSP会话，将编码后的视频流通过RTSP传输出去。
                */
                if(s32Ret == RK_SUCCESS)
                {
                        if(g_rtsplive && g_rtsp_session)
                        {
                                //printf("len = %d PTS = %d \n",stFrame.pstPack->u32Len, stFrame.pstPack->u64PTS);        
                                void *pData = RK_MPI_MB_Handle2VirAddr(stFrame.pstPack->pMbBlk);
                                rtsp_tx_video(g_rtsp_session, (uint8_t *)pData, stFrame.pstPack->u32Len,
                                                          stFrame.pstPack->u64PTS);
                                rtsp_do_event(g_rtsplive);
                        }
                        /*
                        计算帧率：计算当前的帧率，TEST_COMM_GetNowUs 用于获取当前时间戳，然后根据前后帧的时间差来计算。
                        */
                        RK_U64 nowUs = TEST_COMM_GetNowUs();
                        fps = (float) 1000000 / (float)(nowUs - stVpssFrame.stVFrame.u64PTS);                        
                }

                /*
                释放资源：释放VPSS帧和编码流占用的资源，这是非常重要的，以避免内存泄漏。
                */
                // release frame 
                s32Ret = RK_MPI_VPSS_ReleaseChnFrame(0, 0, &stVpssFrame);
                if (s32Ret != RK_SUCCESS) {
                        RK_LOGE("RK_MPI_VI_ReleaseChnFrame fail %x", s32Ret);
                }
                s32Ret = RK_MPI_VENC_ReleaseStream(0, &stFrame);
                if (s32Ret != RK_SUCCESS) {
                        RK_LOGE("RK_MPI_VENC_ReleaseStream fail %x", s32Ret);
                }

        }

        /*
        清理阶段:
        解除VI和VPSS的绑定。
        停止并销毁VPSS组，停止ISP，停止编码器接收帧，销毁编码器通道。
        释放分配的内存，退出MPI系统。
        */
        RK_MPI_SYS_UnBind(&stSrcChn, &stvpssChn);
        
        RK_MPI_VI_DisableChn(0, 0);
        RK_MPI_VI_DisableDev(0);
        
        RK_MPI_VPSS_StopGrp(0);
        RK_MPI_VPSS_DestroyGrp(0);
        
        SAMPLE_COMM_ISP_Stop(0);

        RK_MPI_VENC_StopRecvFrame(0);
        RK_MPI_VENC_DestroyChn(0);

        free(stFrame.pstPack);

        if (g_rtsplive)
                rtsp_del_demo(g_rtsplive);
        
        RK_MPI_SYS_Exit();

        return 0;
}

Finally, we summarize the overall code logic:

Initialization phase :
1. Set the video frame width and height.
2. Initialize the frame rate display text and calculation variables.
3. Initialize the structure of the encoded video stream and video frame information.
4. Initialize the ISP module.
5. Initialize the Rockchip MPI system.
6. Initialize RTSP session and set video stream parameters.
7. Initialize video input devices and VPSS.
8. Binds the video input (VI) to the video processing subsystem (VPSS).
9. Initialize the video encoder and set encoding parameters.
Main loop phase :
1. Get VPSS Frame : Get video frames from VPSS.
2. Process and display frames :
  - Converts the memory block address of the VPSS frame to a virtual address.
  - Use OpenCV to create Mat objects, process and display video frames.
  - Update the contents of the frame, such as copying processed image data back to the frame's memory space.
3. Encode H264 video frames : Send the processed frames to the video encoder for encoding.
4. Get the encoded video stream : Get the encoded video stream from the video encoder.
5. RTSP transmission :
  - If a valid RTSP session exists, the encoded video stream is transmitted via RTSP.
6. Calculate frame rate : Calculate the frame rate based on the current time and the timestamp of the previous frame.
7. Release resources : Release the resources occupied by VPSS frames and encoding streams.
Cleaning phase :
1. Unbind VI from VPSS.
2. Stop and destroy the VPSS group.
3. Stop the ISP module.
4. Stop the video encoder from receiving frames and destroy the encoder channel.
5. Frees the allocated memory.
6. Exit the MPI system.

Basically, it can be seen as this picture:

2) Example of a big boss

1. Code framework

3rdparty : This directory is usually used to store third-party libraries or dependencies used in the project.
build : This directory is used to store files generated during the build (compilation) process.
common : This directory contains common code, libraries, or resources that are used in multiple places in the project.
include : This directory is usually used to store header files used in the project.
lib : This directory is usually used to store library files (such as .a or .so files).
luckfox_rtsp_mnist_dir : This directory is related to the processing or recognition of the MNIST dataset and contains model files and library files.
model : This directory is used to store model files used in the project. These files can be configuration files or data models.
src : This directory usually contains source code files and is where you write your project code.

You can see that there is one more model folder than the official example because we need to use this folder to store the RKNN model we generated ourselves for subsequent compilation.

For this part of the modification, you can view the CMakeLists.txt file and jump to the end.

set(CMAKE_INSTALL_PREFIX "${CMAKE_CURRENT_SOURCE_DIR}/luckfox_rtsp_mnist_dir")
file(GLOB RKNN_FILES "${CMAKE_CURRENT_SOURCE_DIR}/model/model.rknn")
install(TARGETS ${PROJECT_NAME} DESTINATION ${CMAKE_INSTALL_PREFIX})
install(FILES ${RKNN_FILES} DESTINATION ${CMAKE_INSTALL_PREFIX}/model)

This part was modified by the big guys and redefined the installation rules of the project.

Set the installation prefix :

set(CMAKE_INSTALL_PREFIX "${CMAKE_CURRENT_SOURCE_DIR}/luckfox_rtsp_mnist_dir")

This command sets the target path for the project installation. CMAKE_INSTALL_PREFIX is a variable in CMake that defines the installation root directory for the project build results. Here, it is set to ${CMAKE_CURRENT_SOURCE_DIR}/luckfox_rtsp_mnist_dir, which means that the installation path will be a subdirectory named luckfox_rtsp_mnist_dir under the project source directory.

Collect files :

file(GLOB RKNN_FILES "${CMAKE_CURRENT_SOURCE_DIR}/model/model.rknn")

This command uses file(GLOB ...) to collect a list of files that match a pattern. GLOB is short for "global" and is used to find all files that match a given path pattern. Here, it finds all files named model.rknn in the directory ${CMAKE_CURRENT_SOURCE_DIR}/model/ and stores the paths to these files in the variable RKNN_FILES.

Installation Target :

install(TARGETS ${PROJECT_NAME} DESTINATION ${CMAKE_INSTALL_PREFIX})

This command specifies the installation rules for the targets (which may be executable files or library files) generated by the project build. The install() function defines how to install the build target into the system. ${PROJECT_NAME} is the project name defined in the project() command, which is assumed to be a build target. DESTINATION specifies the installation directory for these target files, that is, the CMAKE_INSTALL_PREFIX set above.

Installation Files :

install(FILES ${RKNN_FILES} DESTINATION ${CMAKE_INSTALL_PREFIX}/model)

This command also uses the install() function, but it is used to install files. ${RKNN_FILES} is the list of files collected above by file(GLOB ...), and DESTINATION specifies the installation directory of these files, here is ${CMAKE_INSTALL_PREFIX}/model, which means that all matching model.rknn files will be installed in the luckfox_rtsp_mnist_dir/model directory.

2. Code analysis

We only need to understand the main.cc file here . There are only 723 lines in total. Let's see how the master modified the code:

/*****************************************************************************
 * | Author      :   Luckfox team
 * | Modified By :   knv luyism
 * | Function    :
 * | Info        :
 *
 *----------------
 * | This version:   V1.1
 * | Date        :   2024-05-23
 * | Info        :   Basic version
 * | Function Add:   1. Add the function of recognizing numbers in the image
 * |                 2. Add the function of displaying the recognized number and its probability
 * |                 3. Add the function of displaying the frame rate
 *
 ******************************************************************************/

/*头文件包含: 包含了标准C库头文件、Rockchip平台专用的头文件、OpenCV库的头文件。*/
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <getopt.h>
#include <pthread.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/poll.h>
#include <time.h>
#include <unistd.h>
#include <vector>

#include "rtsp_demo.h"
#include "luckfox_mpi.h"

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <rknn_api.h>

#define MODEL_WIDTH 28
#define MODEL_HEIGHT 28
#define CHANNEL_NUM 1

// 默认 RKNN_TENSOR_UINT8
rknn_tensor_type input_type = RKNN_TENSOR_UINT8;
// 默认 RKNN_TENSOR_NHWC
rknn_tensor_format input_layout = RKNN_TENSOR_NHWC;

// rknn_context ctx = 0;

// 定义一个结构体储存预测到的数字和其对应的概率
struct Prediction
{
        int digit;
        float probability;
};

// 定义全局变量简单队列用于存储预测到的数字和其对应的概率
std::vector<Prediction> predictions_queue;

// 定义一个结构体用于存储rknn的上下文
typedef struct {
    rknn_context rknn_ctx;
    rknn_tensor_mem* max_mem;
    rknn_tensor_mem* net_mem;
    rknn_input_output_num io_num;
    rknn_tensor_attr* input_attrs;
    rknn_tensor_attr* output_attrs;
    rknn_tensor_mem* input_mems[3];
    rknn_tensor_mem* output_mems[3];

    int model_channel;
    int model_width;
    int model_height;
    
    bool is_quant;
} rknn_app_context_t;


// 从文件中加载模型
static unsigned char *load_model(const char *filename, int *model_size)
{
        // 打开指定的文件以读取二进制数据
        FILE *fp = fopen(filename, "rb");
        if (fp == nullptr)
        {
                printf("fopen %s fail!\n", filename);
                return NULL;
        }
        fseek(fp, 0, SEEK_END);
        int model_len = ftell(fp);
        unsigned char *model = (unsigned char *)malloc(model_len); // 申请模型大小的内存，返回指针
        fseek(fp, 0, SEEK_SET);
        // 检查读取模型数据是否成功，如果读取失败，则打印错误信息并释放内存，然后返回空指针。
        if (model_len != fread(model, 1, model_len, fp))
        {
                printf("fread %s fail!\n", filename);
                free(model);
                return NULL;
        }
        *model_size = model_len;
        if (fp)
        {
                fclose(fp);
        }
        return model;
}

// 打印张量属性
static void dump_tensor_attr(rknn_tensor_attr *attr)
{

        printf("  index=%d, name=%s, n_dims=%d, dims=[%d, %d, %d, %d], n_elems=%d, size=%d, fmt=%s, type=%s, qnt_type=%s, "
                "zp=%d, scale=%f\n",
                attr->index, attr->name, attr->n_dims, attr->dims[0], attr->dims[1], attr->dims[2], attr->dims[3],
                attr->n_elems, attr->size, get_format_string(attr->fmt), get_type_string(attr->type),
                get_qnt_type_string(attr->qnt_type), attr->zp, attr->scale);

}


// 在图像中找到数字的轮廓，同时减小找到轮廓时的抖动
cv::Rect find_digit_contour(const cv::Mat &image) {
        
        // 预处理图像
    cv::Mat gray, blurred, edged;
    cv::cvtColor(image, gray, cv::COLOR_BGR2GRAY);
    cv::GaussianBlur(gray, blurred, cv::Size(5, 5), 0);
    cv::Canny(blurred, edged, 30, 150);

    // 应用形态学操作
    cv::Mat kernel = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(5, 5));
    cv::dilate(edged, edged, kernel);
    cv::erode(edged, edged, kernel);

        // 查找轮廓，声明一个变量来存储轮廓
    std::vector<std::vector<cv::Point>> contours;
    cv::findContours(edged, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);

    if (contours.empty()) {
        return cv::Rect();
    }

    // 找到最大的轮廓
    auto largest_contour = std::max_element(contours.begin(), contours.end(),
                                            [](const std::vector<cv::Point>& a, const std::vector<cv::Point>& b) {
                                                return cv::contourArea(a) < cv::contourArea(b);
                                            });
        
        //        **轮廓面积过滤**：在找到轮廓之后，可以排除那些面积过小的轮廓。这样可以减少不必要的小轮廓对整体结果的影响。
        if (cv::contourArea(*largest_contour) < 10) {
                return cv::Rect();
        }

        // **轮廓形状过滤**：除了面积外，还可以考虑其他形状特征，如轮廓宽高比。这样可以排除一些不规则的轮廓，从而提高准确性。
        cv::Rect bounding_box = cv::boundingRect(*largest_contour);
        float aspect_ratio = static_cast<float>(bounding_box.width) / bounding_box.height;
        if (aspect_ratio < 0.2 || aspect_ratio > 3) {
                return cv::Rect();
        }

        // **轮廓稳定性检测**：
        // 通过比较当前帧和之前几帧的轮廓位置来判断轮廓的稳定性。
        // 如果多帧之间的轮廓位置变化较小，则可以认为轮廓比较稳定，不需要进行过多的调整。
        static std::vector<cv::Rect> prev_bounding_boxes;
        if (prev_bounding_boxes.size() > 5) {
                prev_bounding_boxes.erase(prev_bounding_boxes.begin());
        }
        prev_bounding_boxes.push_back(bounding_box);
        if (prev_bounding_boxes.size() == 5) {
                float avg_width = 0.0;
                float avg_height = 0.0;
                for (const auto& box : prev_bounding_boxes) {
                        avg_width += box.width;
                        avg_height += box.height;
                }
                avg_width /= prev_bounding_boxes.size();
                avg_height /= prev_bounding_boxes.size();
                float width_diff = std::abs(bounding_box.width - avg_width) / avg_width;
                float height_diff = std::abs(bounding_box.height - avg_height) / avg_height;
                if (width_diff > 0.1 || height_diff > 0.1) {
                        return cv::Rect();
                }
        }
        // 对图像边框每个方向扩大15个像素
        bounding_box.x = std::max(0, bounding_box.x - 15);
        bounding_box.y = std::max(0, bounding_box.y - 15);
        bounding_box.width = std::min(image.cols - bounding_box.x, bounding_box.width + 30);
        bounding_box.height = std::min(image.rows - bounding_box.y, bounding_box.height + 30);

        // 返回最大轮廓的边界框
        return bounding_box;
}


// 预处理数字区域
cv::Mat preprocess_digit_region(const cv::Mat ®ion)
{
        // 将图像转换为灰度图像，然后调整大小为28x28，最后将像素值归一化为0到1之间的浮点数
        cv::Mat gray, resized, bitwized, normalized;
        cv::cvtColor(region, gray, cv::COLOR_BGR2GRAY);
        
        // 扩大图像中的数字轮廓，使其更容易识别
        cv::threshold(gray, gray, 0, 255, cv::THRESH_BINARY | cv::THRESH_OTSU);

        // 调整图像颜色，将图像颜色中低于127的像素值设置为0，高于200的像素值设置为255
        cv::threshold(gray, gray, 127, 255, cv::THRESH_BINARY_INV);

        // 对图像黑白进行反转，黑色变成白色，白色变成黑色
        cv::bitwise_not(gray, bitwized);
        // 手动实现黑白反转
        for (int i = 0; i < bitwized.rows; i++)
        {
                for (int j = 0; j < bitwized.cols; j++)
                {
                        bitwized.at<uchar>(i, j) = 255 - bitwized.at<uchar>(i, j);
                }
        }

        // 将图片大小调整为28x28，图片形状不发生畸变，过短的部分使用黑色填充
        cv::resize(bitwized, resized, cv::Size(28, 28), 0, 0, cv::INTER_AREA);
        //定义一个局部静态变量，让其只初始化一次，让其在等于200时将图片保存到本目录下
        static int count = 0;
        if (count == 200)
        {
                cv::imwrite("pre.jpg", resized);
        }
        count++;
        printf("count=%d\n", count);

        return resized;
}


// 将量化的INT8数据转换为浮点数
// Parameters:
//   qnt: 量化后的整数数据
//   zp: 零点（zero point）值，用于零点偏移（zero-point offset）
//   scale: 缩放因子，用于缩放量化后的整数数据到浮点数范围
// Returns:
//   浮点数，表示经过反量化（dequantization）后的数据
static float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale) { return ((float)qnt - (float)zp) * scale; }


// 将模型输出进行归一化，并计算输出的概率分布
// Parameters:
//   output_attrs: 输出张量属性，包含了零点（zero point）值和缩放因子等信息
//   output: 模型输出的数据，以INT8格式存储
//   out_fp32: 存储归一化后的浮点数输出数据
static void output_normalization(rknn_tensor_attr* output_attrs, uint8_t *output, float *out_fp32)
{
    int32_t zp =  output_attrs->zp;
    float scale = output_attrs->scale;

        // 将INT8格式的输出数据进行反量化为浮点数，并进行存储
    for(int i = 0; i < 10; i ++)
        out_fp32[i] = deqnt_affine_to_f32(output[i],zp,scale);

        // 计算输出数据的L2范数
    float sum = 0;
    for(int i = 0; i < 10; i++)
        sum += out_fp32[i] * out_fp32[i];
    
        // 对归一化后的浮点数输出进行归一化处理，确保输出数据的范围在[0,1]之间
        float norm = sqrt(sum);
    for(int i = 0; i < 10; i++)
        out_fp32[i] /= norm; 
        
        // 打印输出数据的值
        printf("\n===================Output data values:===================\n");
        for (int i = 0; i < 10; ++i)
        {
                printf("%f ", out_fp32[i]);
        }
        printf("\n");

        // 找出最大概率对应的数字，并记录最大概率及其对应的数字
        float max_prob = -1.0;
        int predicted_digit = -1;
        // 计算最大值的索引
        for (int i = 0; i < 10; ++i)
        {
                if (out_fp32[i] > max_prob)
                {
                        max_prob = out_fp32[i];
                        predicted_digit = i;
                }
        }
        // 将预测的数字及其对应的概率记录到队列中
        predictions_queue.push_back({predicted_digit, max_prob});

        // 打印预测的数字与其对应的概率
        printf("========Predicted digit: %d, Probability: %.2f========\n\n", predicted_digit, max_prob);
}

//定义init_mnist_model函数，用于初始化mnist模型
int init_mnist_model(const char *model_path,rknn_app_context_t *app_mnist_ctx)
{
        int ret;
    int model_len = 0;
    rknn_context ctx_mnist = 0;

        // char *model;
        // ret = rknn_init(&ctx_mnist, (char *)model_path, 0, 0, NULL);
    // if (ret < 0)
    // {
    //     printf("rknn_init fail! ret=%d\n", ret);
    //     return -1;
    // }

        unsigned char * model = load_model(model_path, &model_len);
        ret = rknn_init(&ctx_mnist, model, model_len, 0, NULL);
        if (ret < 0)
        {
                printf("rknn_init failed! ret=%d", ret);
                return -1;
        }

        // Get sdk and driver version
        rknn_sdk_version sdk_ver;
        ret = rknn_query(ctx_mnist, RKNN_QUERY_SDK_VERSION, &sdk_ver, sizeof(sdk_ver));
        if (ret != RKNN_SUCC)
        {
                printf("rknn_query fail! ret=%d\n", ret);
                return -1;
        }
        printf("rknn_api/rknnrt version: %s, driver version: %s\n", sdk_ver.api_version, sdk_ver.drv_version);

        // Get Model Input Output Info
        rknn_input_output_num io_num;
        ret = rknn_query(ctx_mnist, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));
        if (ret != RKNN_SUCC)
        {
                printf("rknn_query fail! ret=%d\n", ret);
                return -1;
        }
        printf("model input num: %d, output num: %d\n", io_num.n_input, io_num.n_output);

        // 打印输入张量
        printf("\ninput tensors:\n");
        rknn_tensor_attr input_attrs[io_num.n_input];
        memset(input_attrs, 0, sizeof(input_attrs));
        for (uint32_t i = 0; i < io_num.n_input; i++)
        {
                input_attrs[i].index = i;
                // query info
                ret = rknn_query(ctx_mnist, RKNN_QUERY_NATIVE_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
                if (ret < 0)
                {
                        printf("rknn_init error! ret=%d\n", ret);
                        return -1;
                }
                dump_tensor_attr(&input_attrs[i]);
        }

        // 打印输出张量
        printf("\noutput tensors:\n");
        rknn_tensor_attr output_attrs[io_num.n_output];
        memset(output_attrs, 0, sizeof(output_attrs));
        for (uint32_t i = 0; i < io_num.n_output; i++)
        {
                output_attrs[i].index = i;
                // When using the zero-copy API interface, query the native output tensor attribute
                ret = rknn_query(ctx_mnist, RKNN_QUERY_NATIVE_NHWC_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
                if (ret != RKNN_SUCC)
                {
                        printf("rknn_query fail! ret=%d\n", ret);
                        return -1;
                }
                dump_tensor_attr(&output_attrs[i]);
        }


        // default input type is int8 (normalize and quantize need compute in outside)
        // if set uint8, will fuse normalize and quantize to npu
        input_attrs[0].type = input_type;
        // default fmt is NHWC, npu only support NHWC in zero copy mode
        input_attrs[0].fmt = input_layout;
        printf("input_attrs[0].size_with_stride=%d\n", input_attrs[0].size_with_stride);
        // Create input tensor memory

        app_mnist_ctx->input_mems[0] = rknn_create_mem(ctx_mnist, input_attrs[0].size_with_stride);

        // 设置输入张量内存
        ret = rknn_set_io_mem(ctx_mnist, app_mnist_ctx->input_mems[0], &input_attrs[0]);
        if (ret < 0)
        {
                printf("rknn_set_io_mem fail! ret=%d\n", ret);
                return -1;
        }
        

        // 设置输出张量内存
        for (uint32_t i = 0; i < io_num.n_output; ++i)
        {
                app_mnist_ctx->output_mems[i] = rknn_create_mem(ctx_mnist, output_attrs[i].size_with_stride);
                // printf("output_attrs[%d].size_with_stride=%d\n", i, output_attrs[i].size_with_stride);
                // set output memory and attribute
                ret = rknn_set_io_mem(ctx_mnist, app_mnist_ctx->output_mems[i], &output_attrs[i]);
                if (ret < 0)
                {
                        printf("rknn_set_io_mem fail! ret=%d\n", ret);
                        return -1;
                }
        }

        // 将模型的上下文信息存储到app_mnist_ctx中
        app_mnist_ctx->rknn_ctx = ctx_mnist;

        // TODO
    if (output_attrs[0].qnt_type == RKNN_TENSOR_QNT_AFFINE_ASYMMETRIC)
    {
        app_mnist_ctx->is_quant = true;
    }
    else
    {
        app_mnist_ctx->is_quant = false;
    }

    app_mnist_ctx->io_num = io_num;
    app_mnist_ctx->input_attrs = (rknn_tensor_attr *)malloc(io_num.n_input * sizeof(rknn_tensor_attr));
    memcpy(app_mnist_ctx->input_attrs, input_attrs, io_num.n_input * sizeof(rknn_tensor_attr));
    app_mnist_ctx->output_attrs = (rknn_tensor_attr *)malloc(io_num.n_output * sizeof(rknn_tensor_attr));
    memcpy(app_mnist_ctx->output_attrs, output_attrs, io_num.n_output * sizeof(rknn_tensor_attr));

    printf("model is NHWC input fmt\n");
    app_mnist_ctx->model_height  = input_attrs[0].dims[1];
    app_mnist_ctx->model_width   = input_attrs[0].dims[2];
    app_mnist_ctx->model_channel = input_attrs[0].dims[3];

        // 打印模型输出信息
    printf("model input height=%d, width=%d, channel=%d\n",
           app_mnist_ctx->model_height, app_mnist_ctx->model_width, app_mnist_ctx->model_channel);

    printf("Init success \n");

    return 0;

}

// 定义内存释放函数，用于释放模型的内存
int release_mnist_model(rknn_app_context_t *app_ctx)
{
    if (app_ctx->rknn_ctx != 0)
    {
        rknn_destroy(app_ctx->rknn_ctx);
        app_ctx->rknn_ctx = 0;
    }
    
    if (app_ctx->net_mem != NULL)
    {
        printf("destory mem\n");
        rknn_destroy_mem(app_ctx->rknn_ctx, app_ctx->net_mem);
        free(app_ctx->net_mem);
    }
    
    if (app_ctx->max_mem != NULL)
    {
        printf("destory mem\n");
        rknn_destroy_mem(app_ctx->rknn_ctx, app_ctx->max_mem);
        free(app_ctx->max_mem);
    }
    
    if (app_ctx->input_attrs != NULL)
    {
        free(app_ctx->input_attrs);
        app_ctx->input_attrs = NULL;
    }

    if (app_ctx->output_attrs != NULL)
    {
        free(app_ctx->output_attrs);
        app_ctx->output_attrs = NULL;
    }

    for (int i = 0; i < app_ctx->io_num.n_input; i++) {
        if (app_ctx->input_mems[i] != NULL) {
            rknn_destroy_mem(app_ctx->rknn_ctx, app_ctx->input_mems[i]);
            free(app_ctx->input_mems[i]);
        }
    }

    for (int i = 0; i < app_ctx->io_num.n_output; i++) {
        if (app_ctx->output_mems[i] != NULL) {
            rknn_destroy_mem(app_ctx->rknn_ctx, app_ctx->output_mems[i]);
            free(app_ctx->output_mems[i]);
        }
    }

    return 0;
}

int run_inference(rknn_app_context_t *app_ctx, cv::Mat &frame)
{

        int ret;

        //****************传入图像至推理内存区******************//
        int width = app_ctx->input_attrs[0].dims[2];
        int stride = app_ctx->input_attrs[0].w_stride;

        if (width == stride)
        {
                memcpy(app_ctx->input_mems[0]->virt_addr, frame.data, width * app_ctx->input_attrs[0].dims[1] * app_ctx->input_attrs[0].dims[3]);
        }
        else
        {
                int height = app_ctx->input_attrs[0].dims[1];
                int channel = app_ctx->input_attrs[0].dims[3];
                // copy from src to dst with stride
                uint8_t *src_ptr = frame.data;
                uint8_t *dst_ptr = (uint8_t *)app_ctx->input_mems[0]->virt_addr;
                // width-channel elements
                int src_wc_elems = width * channel;
                int dst_wc_elems = stride * channel;
                for (int h = 0; h < height; ++h)
                {
                        memcpy(dst_ptr, src_ptr, src_wc_elems);
                        src_ptr += src_wc_elems;
                        dst_ptr += dst_wc_elems;
                }
        }

        // 运行推理
        ret = rknn_run(app_ctx->rknn_ctx, nullptr);
        if (ret < 0)
        {
                printf("rknn_run failed! %s\n", ret);
                return -1;
        }

        uint8_t  *output= (uint8_t*)malloc(sizeof(uint8_t) * 10); 
        float *out_fp32 = (float*)malloc(sizeof(float) * 10); 

        output = (uint8_t *)app_ctx->output_mems[0]->virt_addr;
        output_normalization(&app_ctx->output_attrs[0], output, out_fp32);

        return 0;
}

int main(int argc, char *argv[])
{
        // rknn init
        if (argc != 2)
        {
                printf("Usage: %s <model.rknn>", argv[0]);
                return -1;
        }

        char *model_path = argv[1];

        RK_S32 s32Ret = 0;

        int sX, sY, eX, eY;
        int width = 640;
        int height = 480;

        char fps_text[16];
        float fps = 0;
        memset(fps_text, 0, 16);

        // h264_frame
        VENC_STREAM_S stFrame;
        stFrame.pstPack = (VENC_PACK_S *)malloc(sizeof(VENC_PACK_S));
        VIDEO_FRAME_INFO_S h264_frame;
        VIDEO_FRAME_INFO_S stVpssFrame;

        // rkaiq init
        RK_BOOL multi_sensor = RK_FALSE;
        const char *iq_dir = "/etc/iqfiles";
        rk_aiq_working_mode_t hdr_mode = RK_AIQ_WORKING_MODE_NORMAL;
        // hdr_mode = RK_AIQ_WORKING_MODE_ISP_HDR2;
        SAMPLE_COMM_ISP_Init(0, hdr_mode, multi_sensor, iq_dir);
        SAMPLE_COMM_ISP_Run(0);

        // rkmpi init
        if (RK_MPI_SYS_Init() != RK_SUCCESS)
        {
                RK_LOGE("rk mpi sys init fail!");
                return -1;
        }

        // rtsp init
        rtsp_demo_handle g_rtsplive = NULL;
        rtsp_session_handle g_rtsp_session;
        g_rtsplive = create_rtsp_demo(554);
        g_rtsp_session = rtsp_new_session(g_rtsplive, "/live/0");
        rtsp_set_video(g_rtsp_session, RTSP_CODEC_ID_VIDEO_H264, NULL, 0);
        rtsp_sync_video_ts(g_rtsp_session, rtsp_get_reltime(), rtsp_get_ntptime());

        // vi init
        vi_dev_init();
        vi_chn_init(0, width, height);

        // vpss init
        vpss_init(0, width, height);

        // bind vi to vpss
        MPP_CHN_S stSrcChn, stvpssChn;
        stSrcChn.enModId = RK_ID_VI;
        stSrcChn.s32DevId = 0;
        stSrcChn.s32ChnId = 0;

        stvpssChn.enModId = RK_ID_VPSS;
        stvpssChn.s32DevId = 0;
        stvpssChn.s32ChnId = 0;
        printf("====RK_MPI_SYS_Bind vi0 to vpss0====\n");
        s32Ret = RK_MPI_SYS_Bind(&stSrcChn, &stvpssChn);
        if (s32Ret != RK_SUCCESS)
        {
                RK_LOGE("bind 0 ch venc failed");
                return -1;
        }

        // venc init
        RK_CODEC_ID_E enCodecType = RK_VIDEO_ID_AVC;
        venc_init(0, width, height, enCodecType);


        // rknn结构体变量
        rknn_app_context_t app_mnist_ctx;
        memset(&app_mnist_ctx, 0, sizeof(rknn_app_context_t));
        init_mnist_model(model_path, &app_mnist_ctx);

        while (1)
        {
                // get vpss frame
                s32Ret = RK_MPI_VPSS_GetChnFrame(0, 0, &stVpssFrame, -1);
                if (s32Ret == RK_SUCCESS)
                {
                        void *data = RK_MPI_MB_Handle2VirAddr(stVpssFrame.stVFrame.pMbBlk);
                        
                        // 复制一个帧，然后使用模型推理
                        cv::Mat frame(height, width, CV_8UC3, data);
                        
                        // 在图像中找到数字的轮廓
                        cv::Rect digit_rect = find_digit_contour(frame);
                        if (digit_rect.area() > 0)
                        {
                                cv::Mat digit_region = frame(digit_rect);
                                cv::Mat preprocessed = preprocess_digit_region(digit_region);

                                // 运行推理
                                run_inference(&app_mnist_ctx, preprocessed);
                                
                                // 从predictions_queue中获取预测到的数字和其对应的概率
                                if (!predictions_queue.empty())
                                {
                                        Prediction prediction = predictions_queue.back();
                                        
                                        cv::rectangle(frame, digit_rect, cv::Scalar(0, 255, 0), 2);
                                        // 在图像上显示预测结果,显示字号为1，颜色为红色，粗细为2
                                        cv::putText(frame, std::to_string(prediction.digit), cv::Point(digit_rect.x, digit_rect.y - 10),
                                                                cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 0), 2);
                                        // 在图像上显示预测概率
                                        cv::putText(frame, std::to_string(prediction.probability), cv::Point(digit_rect.x+ 30, digit_rect.y - 10),
                                                                cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(230, 0, 0), 2);

                                        // 打印预测到的数字和其对应的概率
                                        // printf("****** Predicted digit: %d, Probability: %.2f ******\n", prediction.digit, prediction.probability);
                                        // 从predictions_queue中删除最旧的元素
                                        predictions_queue.pop_back();
                                }

                        }

                        sprintf(fps_text, "fps:%.2f", fps);
                        cv::putText(frame, fps_text,
                                                cv::Point(40, 40),
                                                cv::FONT_HERSHEY_SIMPLEX, 1,
                                                cv::Scalar(0, 255, 0), 2);
                        memcpy(data, frame.data, width * height * 3);
                }

                // send stream
                // encode H264
                RK_MPI_VENC_SendFrame(0, &stVpssFrame, -1);
                // rtsp
                s32Ret = RK_MPI_VENC_GetStream(0, &stFrame, -1);
                if (s32Ret == RK_SUCCESS)
                {
                        if (g_rtsplive && g_rtsp_session)
                        {
                                // printf("len = %d PTS = %d \n",stFrame.pstPack->u32Len, stFrame.pstPack->u64PTS);
                                void *pData = RK_MPI_MB_Handle2VirAddr(stFrame.pstPack->pMbBlk);
                                rtsp_tx_video(g_rtsp_session, (uint8_t *)pData, stFrame.pstPack->u32Len,
                                                          stFrame.pstPack->u64PTS);
                                rtsp_do_event(g_rtsplive);
                        }
                        RK_U64 nowUs = TEST_COMM_GetNowUs();
                        fps = (float)1000000 / (float)(nowUs - stVpssFrame.stVFrame.u64PTS);
                }

                // release frame
                s32Ret = RK_MPI_VPSS_ReleaseChnFrame(0, 0, &stVpssFrame);
                if (s32Ret != RK_SUCCESS)
                {
                        RK_LOGE("RK_MPI_VI_ReleaseChnFrame fail %x", s32Ret);
                }
                s32Ret = RK_MPI_VENC_ReleaseStream(0, &stFrame);
                if (s32Ret != RK_SUCCESS)
                {
                        RK_LOGE("RK_MPI_VENC_ReleaseStream fail %x", s32Ret);
                }
        }

        RK_MPI_SYS_UnBind(&stSrcChn, &stvpssChn);

        RK_MPI_VI_DisableChn(0, 0);
        RK_MPI_VI_DisableDev(0);

        RK_MPI_VPSS_StopGrp(0);
        RK_MPI_VPSS_DestroyGrp(0);

        SAMPLE_COMM_ISP_Stop(0);

        RK_MPI_VENC_StopRecvFrame(0);
        RK_MPI_VENC_DestroyChn(0);

        free(stFrame.pstPack);

        if (g_rtsplive)
                rtsp_del_demo(g_rtsplive);

        RK_MPI_SYS_Exit();

        // 释放模型内存
        release_mnist_model(&app_mnist_ctx);
        

        return 0;
}

It can be seen that the current main.cc is an extension of the official example, adding the function of using the RKNN model to recognize numbers in images. The main differences are:

RKNN model loading and reasoning :
1. Load the RKNN model into memory.
2. Define a structure to store the predicted numbers and their probabilities.
3. Detect digital contours in images and perform preprocessing.
4. Perform model inference and pass the preprocessed image data into the model.
5. Normalize the model output, calculate the probability, and store the prediction results.
Digital recognition and display :
1. In the main loop, the digits in the image are detected and recognized using the RKNN model.
2. Plot the recognized digits and their probabilities on the image.
3. Maintain a prediction result queue and display the latest prediction result on the image.

Let's analyze in detail what is done in the main function:

int main(int argc, char *argv[]) {
    // ... 省略了共同的初始化代码 ...

    // rknn init
    if (argc != 2) {
        printf("Usage: %s <model.rknn>", argv[0]);
        return -1;
    }

    char *model_path = argv[1];

    // ------- 初始化 RKNN 模型
    rknn_app_context_t app_mnist_ctx;
    memset(&app_mnist_ctx, 0, sizeof(rknn_app_context_t));
    init_mnist_model(model_path, &app_mnist_ctx);

    while (1) {
        // get vpss frame
        s32Ret = RK_MPI_VPSS_GetChnFrame(0, 0, &stVpssFrame, -1);
        if (s32Ret == RK_SUCCESS) {
            void *data = RK_MPI_MB_Handle2VirAddr(stVpssFrame.stVFrame.pMbBlk);

            // 复制一个帧，然后使用模型推理
            cv::Mat frame(height, width, CV_8UC3, data);

            // ------- 在图像中找到数字的轮廓
            cv::Rect digit_rect = find_digit_contour(frame);
            if (digit_rect.area() > 0) {
                cv::Mat digit_region = frame(digit_rect);
                cv::Mat preprocessed = preprocess_digit_region(digit_region);

                // ------- 运行推理
                run_inference(&app_mnist_ctx, preprocessed);

                // ------- 从predictions_queue中获取预测到的数字和其对应的概率
                if (!predictions_queue.empty()) {
                    Prediction prediction = predictions_queue.back();

                    cv::rectangle(frame, digit_rect, cv::Scalar(0, 255, 0), 2);
                    cv::putText(frame, std::to_string(prediction.digit), cv::Point(digit_rect.x, digit_rect.y - 10),
                                cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 0), 2);
                    cv::putText(frame, std::to_string(prediction.probability), cv::Point(digit_rect.x + 30, digit_rect.y - 10),
                                cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(230, 0, 0), 2);

                    // ------- 从predictions_queue中删除最旧的元素
                    predictions_queue.pop_back();
                }
            }

            sprintf(fps_text, "fps:%.2f", fps);
            cv::putText(frame, fps_text,
                        cv::Point(40, 40),
                        cv::FONT_HERSHEY_SIMPLEX, 1,
                        cv::Scalar(0, 255, 0), 2);
            memcpy(data, frame.data, width * height * 3);
        }

        // send stream
        // ... 省略了共同的编码和发送代码 ...

        // release frame
        // ... 省略了共同的释放帧代码 ...
    }

    // ... 省略了共同的清理代码 ...

    // ------- 释放 RKNN 模型内存
    release_mnist_model(&app_mnist_ctx);

    return 0;
}

It can be seen that the most important thing is the use of zero-copy API. If you want to know more, you can read the original post of the big guy: https://en.eeworld.com/bbs/thread-1282745-1-1.html

The general framework is as follows:

VI. Conclusion

Finally, let’s summarize:

The model recognition rate can be as high as 98% when tested on a computer, but the accuracy can only reach about 60% when using a camera to collect data in real time and use the model for recognition.

possible reason:

Lighting and image quality : Images captured by real-time cameras may be affected by lighting conditions, resulting in poor image quality.
Image resolution and scaling : The image resolution collected in real time is different from the image resolution used when the model is trained, and image scaling leads to information loss.
Model input inconsistency : Live images may not be consistent with the images used to train the model in terms of size, color space, or data normalization.
Camera hardware limitations : The hardware limitations of the camera itself, such as resolution, focus, lens quality, etc., may affect the clarity and accuracy of the image.

improvement measures:

Improve camera hardware limitations : Make sure the camera is working under sufficient lighting, or use better resolution, focus, or lens quality.
Adjust image preprocessing : Perform more detailed preprocessing on the real-time acquired images, such as histogram equalization, contrast enhancement, etc.
Optimize model input : Ensure that the resolution, color space, and normalization method of the real-time image are consistent with those used during model training.
Model retraining : Use real-time acquired images to further train or fine-tune the model to adapt to actual application scenarios.
Increase model complexity : You can try to use a more complex model structure to improve recognition accuracy.
Performance optimization : Replace the chip to improve computing power, optimize the code and algorithm, reduce the delay of real-time processing, and ensure enough time for image processing.
Data augmentation : Data augmentation techniques are applied during the model training phase to make the model more adaptable to various image conditions.
Post-processing improvements : Apply post-processing techniques, such as non-maximum suppression (NMS) or thresholding, to improve recognition accuracy.