#AI Challenge Camp Terminal# Based on RV1106 handwritten digit recognition deployment

xianhangCheng

#AI Challenge Camp Terminal# Based on RV1106 handwritten digit recognition deployment [Copy link]

This post was last edited by xianhangCheng on 2024-5-30 15:57

Physical picture

The above picture includes the RV1106 development board + SC3336 camera. This board is quite sophisticated, has its own SPI NAND FLASH, and is equipped with a test system when it leaves the factory. After receiving the board, you need to burn the network disk system yourself.

The price and performance are very cost-effective. Such a small board has 256MB DDR3L onboard and can run yolov5 directly. The only drawback is that it generates a lot of heat when running inference.

Prepare

Reference Manual: Luckfox official website tutorial SPI NAND Flash image burning | LUCKFOX WIKI

Prepare tools in advance:

1. Download RK Driver Assistant DriverAssitant ( click here to download ).

2. Download and unzip the burning tool ( click here to download ).

3. Download the image file

链接已隐藏，如需查看请登录或者注册

.

4.下载VLC media player官方 VLC 媒體播放器下載，最好的開放原始碼播放器 - VideoLAN

Note: It is best to turn off the firewall and virus and threat protection when downloading RK Driver Assistant and Burning Tool, otherwise it may cause the following situations:

The driver is not available;
The board cannot be recognized.

As shown below:

Turn on the board and camera

Burn the image:

Reference: SPI NAND Flash Image Burning | LUCKFOX WIKI

First we need to burn the board. The official provides linux images. There are ubuntu images and buildroot images. Here we choose the officially recommended buildroot image. The active board is the Max series, so we choose the luckfox_pico_pro_max_image image.

Log in:

Reference: SSH/Telnet Login | LUCKFOX WIKI

The latest firmware of the Luckfox Pico series has SSH enabled by default. Here we log in using a static IP via USB connection.

Then we turn off the firewall and configure the computer's rndis network port's IP address to 172.32.0.100 and subnet mask to 255.255.0.0. Now you can access 172.32.0.93 via ssh. Use the built-in Powershell terminal to enter the password for direct login. The format is ssh client username@server ip address

ssh root@172.32.0.93

Login account: root
Login password: luckfox
Static IP address: 172.32.0.93

Use the ls command to view the files.

The system will automatically identify the camera and generate the rkipc.ini file.

Download and install VLC media player.
Open VLC media player, go to Media -> Open Network Stream, and enter the default IP address: rtsp://172.32.0.93/live/0

This way you can see the images captured by the camera.

Note: For subsequent deployment, we need to shut down the system default rkipc program and executeRkLunch-stop.shthe command.

Configure the development environment:

Reference document: SDK environment deployment (PC side) | LUCKFOX WIKI

This is more complicated and I don’t understand it either, so just follow the tutorial.

Compile and run:

export LUCKFOX_SDK_PATH=<Your Luckfox-pico Sdk Path>
mkdir build
cd build
cmake ..
make && make install

Upload the compiled luckfox_rtsp_opencv_demo/luckfox_rtsp_opencv, lib directory and model weight model.rknn to luckfox-pico, enter the folder and run.

File transfer commands:

# 传输文件
scp model.rknn root@172.32.0.93:/root
# 传输文件夹
scp -r luckfox_rtsp_opencv_demo root@172.32.0.93:/root

Note: For the first connection, you need to enter yes to confirm, then enter the password luckfox to start the transmission.
There should be three files luckfox_rtsp_opencv lib model.rknn in the running directory.

Grant permissions and run.

chmod 755 luckfox_rtsp_opencv
./luckfox_rtsp_opencv ./model.rknn

Code implementation details:

Define a structure to store the predicted numbers and their probabilities

// 定义一个结构体储存预测到的数字和其对应的概率
struct Prediction
{
	int digit;
	float probability;
};


// 定义全局变量简单队列用于存储预测到的数字和其对应的概率
std::vector<Prediction> predictions_queue;

Defines a function for loading a model, which is used to load model data from a specified binary file and ensure that resources are released after the operation is completed.

// 函数名称: load_model
// 输入参数: filename - 要加载的模型文件名，model_size - 用于存储模型大小的指针
// 返回值: 加载成功则返回模型数据指针，失败则返回NULL
static unsigned char *load_model(const char *filename, int *model_size)
{
    // 以二进制模式打开文件
    FILE *fp = fopen(filename, "rb");
    if (fp == nullptr)
    {
        // 打开文件失败，输出错误信息并返回NULL
        printf("fopen %s fail!\\n", filename);
        return NULL;
    }

    // 获取文件长度
    fseek(fp, 0, SEEK_END);
    int model_len = ftell(fp);

    // 分配model_len字节大小的内存，类型为unsigned char
    unsigned char *model = (unsigned char *)malloc(model_len);

    // 重新定位文件指针到文件开始
    fseek(fp, 0, SEEK_SET);

    // 读取文件中的内容到分配的内存中
    if (model_len != fread(model, 1, model_len, fp))
    {
        // 读取失败，输出错误信息并释放内存
        printf("fread %s fail!\\n", filename);
        free(model);
        return NULL;
    }

    // 将实际读取到的模型长度赋给*model_size
    *model_size = model_len;

    // 关闭文件
    if (fp)
    {
        fclose(fp);
    }

    // 返回模型数据的内存地址
    return model;
}

Define the function find_digit_contour to find the contour of the digit in the input image and perform jitter reduction and shape filtering.

// 函数名称：find_digit_contour
// 功能：在输入图像中查找数字的轮廓，并进行抖动减小和形状过滤
// 输入参数：image - 输入图像
// 返回值：bounding_box - 数字的边界框
cv::Rect find_digit_contour(const cv::Mat &image) {
   
   // 预处理图像
   cv::Mat gray, blurred, edged;
   cv::cvtColor(image, gray, cv::COLOR_BGR2GRAY); // 转换为灰度图像
   cv::GaussianBlur(gray, blurred, cv::Size(5, 5), 0); // 高斯模糊去噪
   cv::Canny(blurred, edged, 30, 150); // 边缘检测

   // 应用形态学操作
   cv::Mat kernel = cv::getStructuringElement(cv::MORPH_RECT, cv::Size(5, 5));
   cv::dilate(edged, edged, kernel); // 膨胀操作
   cv::erode(edged, edged, kernel); // 腐蚀操作

   // 查找轮廓
   std::vector<std::vector<cv::Point>> contours;
   cv::findContours(edged, contours, cv::RETR_EXTERNAL, cv::CHAIN_APPROX_SIMPLE);

   if (contours.empty()) {
       return cv::Rect();
   }

   // 找到最大的轮廓
   auto largest_contour = std::max_element(contours.begin(), contours.end(),
                                           [](const std::vector<cv::Point>& a, const std::vector<cv::Point>& b) {
                                               return cv::contourArea(a) < cv::contourArea(b);
                                           });

   // 轮廓面积过滤
   if (cv::contourArea(*largest_contour) < 10) {
       return cv::Rect();
   }

   // 轮廓形状过滤
   cv::Rect bounding_box = cv::boundingRect(*largest_contour);
   float aspect_ratio = static_cast<float>(bounding_box.width) / bounding_box.height;
   if (aspect_ratio < 0.2 || aspect_ratio > 3) {
       return cv::Rect();
   }

   // 轮廓稳定性检测
   static std::vector<cv::Rect> prev_bounding_boxes;
   if (prev_bounding_boxes.size() > 5) {
       prev_bounding_boxes.erase(prev_bounding_boxes.begin());
   }
   prev_bounding_boxes.push_back(bounding_box);
   if (prev_bounding_boxes.size() == 5) {
       float avg_width = 0.0;
       float avg_height = 0.0;
       for (const auto& box : prev_bounding_boxes) {
           avg_width += box.width;
           avg_height += box.height;
       }
       avg_width /= prev_bounding_boxes.size();
       avg_height /= prev_bounding_boxes.size();
       float width_diff = std::abs(bounding_box.width - avg_width) / avg_width;
       float height_diff = std::abs(bounding_box.height - avg_height) / avg_height;
       if (width_diff > 0.1 || height_diff > 0.1) {
           return cv::Rect();
       }
   }

   // 扩大边界框，以包含更多相关像素
   bounding_box.x = std::max(0, bounding_box.x - 15);
   bounding_box.y = std::max(0, bounding_box.y - 15);
   bounding_box.width = std::min(image.cols - bounding_box.x, bounding_box.width + 30);
   bounding_box.height = std::min(image.rows - bounding_box.y, bounding_box.height + 30);

   // 返回最大轮廓的边界框
   return bounding_box;
}

The main function of this code is to find the outline of the number in the input image and perform jitter reduction and shape filtering. It mainly includes the following steps:

The images were preprocessed, including grayscale, Gaussian blur, and edge detection.
Morphological operations (dilation and erosion) are used to remove noise and connect contours.
Find all external contours and select the largest one. For the largest contour, perform area and shape filtering to reduce unnecessary small contours.
The image border is enlarged by 15 pixels in each direction to include more relevant pixels.
Returns the final bounding box.

Define an image preprocessing function, whose purpose is to preprocess the input digital area image to facilitate subsequent digital recognition.

// 函数名称：preprocess_digit_region
// 功能：对输入的数字区域图像进行预处理，包括灰度转换、二值化、颜色反转和大小调整
// 输入参数：region - 输入的数字区域图像
// 返回值：resized - 预处理后的图像，大小为28x28，像素值归一化为0到1之间的浮点数
cv::Mat preprocess_digit_region(const cv::Mat ®ion) {
    // 将输入图像转换为灰度图像
    cv::Mat gray;
    cv::cvtColor(region, gray, cv::COLOR_BGR2GRAY);

    // 使用OTSU方法自动确定阈值，将灰度图像二值化
    cv::threshold(gray, gray, 0, 255, cv::THRESH_BINARY | cv::THRESH_OTSU);

    // 再次应用阈值处理，将灰度值低于127的像素设置为0，高于200的像素设置为255
    cv::threshold(gray, gray, 127, 255, cv::THRESH_BINARY_INV);

    // 创建一个与gray相同大小的Mat对象，用于存储颜色反转后的图像
    cv::Mat bitwized = cv::Mat::zeros(gray.size(), gray.type());

    // 对图像进行颜色反转，将黑色变成白色，白色变成黑色
    cv::bitwise_not(gray, bitwized);

    // 手动实现颜色反转，以验证bitwise_not函数的效果
    for (int i = 0; i < bitwized.rows; i++) {
        for (int j = 0; j < bitwized.cols; j++) {
            bitwized.at<uchar>(i, j) = 255 - bitwized.at<uchar>(i, j);
        }
    }

    // 将处理后的图像大小调整为28x28，使用INTER_AREA插值方法以保持图像细节
    cv::Mat resized;
    cv::resize(bitwized, resized, cv::Size(28, 28), 0, 0, cv::INTER_AREA);

    // 返回调整大小后的图像
    return resized;
}

The main steps of this code are:

Convert the input color image to grayscale.
The OTSU method is used to automatically determine the threshold and binarize the grayscale image.
Apply thresholding again to adjust the color distribution of the image.
Invert the colors of the binarized image.
Manually verify the effect of color inversion.
Resize the image to 28x28 and use the INTER_AREA interpolation method to preserve image details.
Returns the preprocessed image

Post-processing stage:

deqnt_affine_to_f32 function: converts quantized INT8 data into floating point numbers.

output_normalization function: Normalize the output of the model and calculate the probability distribution of the output.

// 将量化的INT8数据转换为浮点数
// Parameters:
//   qnt: 量化后的整数数据
//   zp: 零点（zero point）值，用于零点偏移（zero-point offset）
//   scale: 缩放因子，用于缩放量化后的整数数据到浮点数范围
// Returns:
//   浮点数，表示经过反量化（dequantization）后的数据
static float deqnt_affine_to_f32(int8_t qnt, int32_t zp, float scale) {
    // 返回((float)qnt - (float)zp) * scale，即量化值减去零点值后乘以缩放因子
    return ((float)qnt - (float)zp) * scale;
}

// 将模型输出进行归一化，并计算输出的概率分布
// Parameters:
//   output_attrs: 输出张量属性，包含了零点（zero point）值和缩放因子等信息
//   output: 模型输出的数据，以INT8格式存储
//   out_fp32: 存储归一化后的浮点数输出数据
static void output_normalization(rknn_tensor_attr* output_attrs, uint8_t *output, float *out_fp32)
{
    int32_t zp = output_attrs->zp; // 从输出张量属性中获取零点值
    float scale = output_attrs->scale; // 从输出张量属性中获取缩放因子

    // 将INT8格式的输出数据进行反量化为浮点数，并进行存储
    for(int i = 0; i < 10; i ++)
        out_fp32[i] = deqnt_affine_to_f32(output[i],zp,scale);

    // 计算输出数据的L2范数
    float sum = 0;
    for(int i = 0; i < 10; i++)
        sum += out_fp32[i] * out_fp32[i];

    // 对归一化后的浮点数输出进行归一化处理，确保输出数据的范围在[0,1]之间
    float norm = sqrt(sum);
    for(int i = 0; i < 10; i++)
        out_fp32[i] /= norm;

    // 打印输出数据的值
    printf("\\n===================Output data values:===================\\n");
    for (int i = 0; i < 10; ++i)
    {
        printf("%f ", out_fp32[i]);
    }
    printf("\\n");

    // 找出最大概率对应的数字，并记录最大概率及其对应的数字
    float max_prob = -1.0;
    int predicted_digit = -1;
    // 计算最大值的索引
    for (int i = 0; i < 10; ++i)
    {
        if (out_fp32[i] > max_prob)
        {
            max_prob = out_fp32[i];
            predicted_digit = i;
        }
    }

    // 将预测的数字及其对应的概率记录到队列中
    predictions_queue.push_back({predicted_digit, max_prob});

    // 打印预测的数字与其对应的概率
    printf("========Predicted digit: %d, Probability: %.2f========\\n\\n", predicted_digit, max_prob);
}

Use the deqnt_affine_to_f32 function to convert the INT8 format data to floating point numbers.
In the output_normalization function, the converted floating-point data is normalized so that the sum of its probability distribution is 1. The probability value of each number is calculated and printed, the number corresponding to the maximum probability is found, and its prediction result and probability are recorded in the queue.

Define the inference function, which is mainly used to process the input image data, convert it into the format required by the model, and perform inference to obtain the output results. It includes the following functions:

Get the input and output properties of the model.
Process the input image and convert it into the format required by the model.
Create and set memory for input and output tensors.
Run model inference.
Process the output and get the predicted numbers.
Frees the allocated memory.

// 定义函数run_inference，接收一个cv::Mat类型的图像帧作为输入
int run_inference(cv::Mat &frame)
{
    int ret = 0; // 初始化返回值为0
    rknn_input_output_num io_num; // 定义结构体用于存储输入输出通道数

    // 获取模型的输入输出通道数
    rknn_query(ctx, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));

    // 初始化输入属性数组
    rknn_tensor_attr input_attrs[io_num.n_input];
    memset(input_attrs, 0, io_num.n_input * sizeof(rknn_tensor_attr));
    for (uint32_t i = 0; i < io_num.n_input; i++)
    {
        input_attrs[i].index = i; // 设置输入属性的索引
        // 查询输入属性的详细信息
        ret = rknn_query(ctx, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]), sizeof(rknn_tensor_attr));
        if (ret < 0)
        {
            printf("rknn_init error! ret=%d\\n", ret);
            return -1; // 如果查询失败，返回错误
        }
        dump_tensor_attr(&input_attrs[i]); // 打印输入属性信息
    }

    printf("output tensors:\\n");
    // 初始化输出属性数组
    rknn_tensor_attr output_attrs[io_num.n_output];
    memset(output_attrs, 0, io_num.n_output * sizeof(rknn_tensor_attr));
    for (uint32_t i = 0; i < io_num.n_output; i++)
    {
        output_attrs[i].index = i; // 设置输出属性的索引
        // 查询输出属性的详细信息
        ret = rknn_query(ctx, RKNN_QUERY_NATIVE_OUTPUT_ATTR, &(output_attrs[i]), sizeof(rknn_tensor_attr));
        if (ret != RKNN_SUCC)
        {
            printf("rknn_query fail! ret=%d\\n", ret);
            return -1; // 如果查询失败，返回错误
        }
        dump_tensor_attr(&output_attrs[i]); // 打印输出属性信息
    }

    printf("Gray image size: %dx%d\\n", frame.rows, frame.cols);
    printf("Gray image type: %d\\n", frame.type());
    // 计算并分配用于存储调整大小后图像的内存
    int mem_size = MODEL_WIDTH * MODEL_HEIGHT * CHANNEL_NUM;
    unsigned char *resize_buf = (unsigned char *)malloc(mem_size);
    memset(resize_buf, 0, mem_size);

    // 创建输入张量内存
    rknn_tensor_mem *input_mems[1];
    input_attrs[0].type = input_type; // 设置输入类型
    input_attrs[0].fmt = input_layout; // 设置输入格式
    input_mems[0] = rknn_create_mem(ctx, input_attrs[0].size_with_stride);

    // 将输入数据复制到输入张量内存
    int width = input_attrs[0].dims[2];
    int stride = input_attrs[0].w_stride;
    if (width == stride)
    {
        memcpy(input_mems[0]->virt_addr, frame.data, width * input_attrs[0].dims[1] * input_attrs[0].dims[3]);
    }
    else
    {
        int height = input_attrs[0].dims[1];
        int channel = input_attrs[0].dims[3];
        uint8_t *src_ptr = frame.data;
        uint8_t *dst_ptr = (uint8_t *)input_mems[0]->virt_addr;
        int src_wc_elems = width * channel;
        int dst_wc_elems = stride * channel;
        for (int h = 0; h < height; ++h)
        {
            memcpy(dst_ptr, src_ptr, src_wc_elems);
            src_ptr += src_wc_elems;
            dst_ptr += dst_wc_elems;
        }
    }

    // 创建输出张量内存
    rknn_tensor_mem *output_mems[io_num.n_output];
    for (uint32_t i = 0; i < io_num.n_output; ++i)
    {
        output_mems[i] = rknn_create_mem(ctx, output_attrs[i].size_with_stride);
    }

    // 设置输入张量内存
    ret = rknn_set_io_mem(ctx, input_mems[0], &input_attrs[0]);
    if (ret < 0)
    {
        printf("rknn_set_io_mem fail! ret=%d\\n", ret);
        return -1;
    }

    // 设置输出张量内存
    for (uint32_t i = 0; i < io_num.n_output; ++i)
    {
        ret = rknn_set_io_mem(ctx, output_mems[i], &output_attrs[i]);
        if (ret < 0)
        {
            printf("rknn_set_io_mem fail! ret=%d\\n", ret);
            return -1;
        }
    }

    // 运行推理
    ret = rknn_run(ctx, nullptr);
    if (ret < 0)
    {
        printf("rknn_run failed! %s\\n", ret);
        return -1;
    }

    uint8_t  *output= (uint8_t*)malloc(sizeof(uint8_t) * 10); 
    float *out_fp32 = (float*)malloc(sizeof(float) * 10); 
    output = (uint8_t *)output_mems[0]->virt_addr;

    // 获取预测的数字
    output_normalization(&output_attrs[0], output, out_fp32);

    // 释放内存
    rknn_destroy_mem(ctx, input_mems[0]);
    for (uint32_t i = 0; i < io_num.n_output; ++i)
    {
        rknn_destroy_mem(ctx, output_mems[i]);
    }	
}

The main function is to detect and display numbers in the real-time video stream, and encode the video and send it to the rtsp server. It includes the following steps:

Initialize necessary modules, such as rkaiq, rkmpi, rtsp, etc.
Vi (Video Interface) and VPSS (Video Processing System) modules are bound through rkmpi for video processing and encoding.
Initialize the VENC (Video Encoder) module for video encoding.
Load and initialize the rknn model for handwritten digit recognition.
In a loop, frames are acquired from VPSS, object recognition and digit prediction are done, then encoded into H264 and sent to rtsp stream.
Update the frame rate (fps) and display it on the image.
At the end of each loop, frame resources and memory are released.
Finally, release the resources of all modules, destroy the rknn model, and exit the program.

First we need to define the image resolution

int width = 640;
int height = 480;

Here, we mainly introduce the logic code in the while loop

1.opencv obtains the camera frame and calls the find_digit_contour function to find the contour of the digit in it, and performs jitter reduction and shape filtering.

void *data = RK_MPI_MB_Handle2VirAddr(stVpssFrame.stVFrame.pMbBlk);
cv::Mat frame(height, width, CV_8UC3, data);

cv::Rect digit_rect = find_digit_contour(frame);

2.if (digit_rect.area() > 0), opencv intercepts the area of the digit and preprocesses it, and then sends the preprocessed data to the run_inference function for data inference.

cv::Mat digit_region = frame(digit_rect);
cv::Mat preprocessed = preprocess_digit_region(digit_region);
int prediction = run_inference(preprocessed);

3. Display the recognition results on the current frame.

// 从predictions_queue中获取预测到的数字和其对应的概率
//检查predictions_queue是否为空，如果不为空，则取出最后一个元素作为当前帧的识别结果。
if (!predictions_queue.empty())
{
	Prediction prediction = predictions_queue.back();
					
	cv::rectangle(frame, digit_rect, cv::Scalar(0, 255, 0), 2);
	// 在图像上显示预测结果,显示字号为1，颜色为红色，粗细为2
	cv::putText(frame, std::to_string(prediction.digit), cv::Point(digit_rect.x, digit_rect.y - 10),
	cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(255, 0, 0), 2);

	// 在图像上显示预测概率
	cv::putText(frame, std::to_string(prediction.probability), cv::Point(digit_rect.x+ 30, digit_rect.y - 10),
	cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(230, 0, 0), 2);

	// 从predictions_queue中删除最旧的元素，以便下一次迭代时可以取出新的识别结果
	predictions_queue.pop_back();
}

4. Finally, the current frame image data is copied to the rtsp frame

memcpy(data, frame.data, width * height * 3);

Actual demonstration effect

There was no equipment to fix the camera, so I had to hold the camera in my hand while recording the screen, and the video was a bit shaky.

However, the number recognition effect is still good, and there are no number recognition errors. All numbers 0 to 9 are recognized correctly.

20240530_004503

Attached reference code:

链接已隐藏，如需查看请登录或者注册

wangerxian

The analysis is good, and very valuable for reference!

wangerxian

The recognition rate of my running model is very low. How did you adjust the parameters? Can you analyze it for me?

秦天qintian0303

However, this can only recognize one at a time. It would be better if all the images in the picture can be recognized.

#AI Challenge Camp Terminal# Based on RV1106 handwritten digit recognition deployment [Copy link]

Latest reply