[Canaan Technology CanMV K230 Review] Face detection, palm detection and gesture recognition

王嘉辉

[Canaan Technology CanMV K230 Review] Face detection, palm detection and gesture recognition [Copy link]

Face Detection

The most widely used field of vision is probably face recognition, whether it is the camera installed on the door lock of a home, the access control machine in a community or company, or even the face detection system used to track down criminals in public places such as airports, they are all practical applications of face recognition. The first step of face recognition is to accurately detect the face in the image. This is also the first step of face recognition, face detection.

After inquiry, the most commonly used and basic face detection algorithm is Haar operator feature detection. However, K230 has greatly simplified the process of AI development. As shown in the figure below, this is an AI development framework officially provided by Canaan. All we need to do is complete the following five steps in the AI part, namely configuration pre-processing, pre-processing, reasoning, post-processing, and display results. The subsequent AI code development will basically revolve around these contents. Usually, a class is created, and then several methods are defined in the class to implement the following steps, and then the class is instantiated, and the corresponding method is called at the appropriate time to implement the corresponding function.

In order to facilitate our development of AI-related content, the official provides us with several packaged API interfaces, namely PineLine, Ai2d, and AIBase. What they do is to complete the image acquisition and display, preprocessing related interfaces, and model reasoning related interfaces. Use the above interfaces to implement the following AI visual routines.

Python has the concept of class. Usually the first method of a class is init. This method will automatically initialize when instantiated and assign the passed parameters to the instantiation. If there is no init function, it is like it exists for a short period of time, but the thing itself does not exist. So it is very important to add init function. The Python knowledge in the following AI vision is more complicated, so it is necessary to supplement some basic knowledge of Python at the same time.

Below we directly refer to the official code for analysis.

from libs.PipeLine import PipeLine, ScopedTiming
from libs.AIBase import AIBase
from libs.AI2D import Ai2d
import os
import ujson
from media.media import *
from time import *
import nncase_runtime as nn
import ulab.numpy as np
import time
import utime
import image
import random
import gc
import sys
import aidemo

# 自定义人脸检测类，继承自AIBase基类
class FaceDetectionApp(AIBase):
    def __init__(self, kmodel_path, model_input_size, anchors, confidence_threshold=0.5, nms_threshold=0.2, rgb888p_size=[224,224], display_size=[1920,1080], debug_mode=0):
        super().__init__(kmodel_path, model_input_size, rgb888p_size, debug_mode)  # 调用基类的构造函数
        self.kmodel_path = kmodel_path  # 模型文件路径
        self.model_input_size = model_input_size  # 模型输入分辨率
        self.confidence_threshold = confidence_threshold  # 置信度阈值
        self.nms_threshold = nms_threshold  # NMS（非极大值抑制）阈值
        self.anchors = anchors  # 锚点数据，用于目标检测
        self.rgb888p_size = [ALIGN_UP(rgb888p_size[0], 16), rgb888p_size[1]]  # sensor给到AI的图像分辨率，并对宽度进行16的对齐
        self.display_size = [ALIGN_UP(display_size[0], 16), display_size[1]]  # 显示分辨率，并对宽度进行16的对齐
        self.debug_mode = debug_mode  # 是否开启调试模式
        self.ai2d = Ai2d(debug_mode)  # 实例化Ai2d，用于实现模型预处理
        self.ai2d.set_ai2d_dtype(nn.ai2d_format.NCHW_FMT, nn.ai2d_format.NCHW_FMT, np.uint8, np.uint8)  # 设置Ai2d的输入输出格式和类型

    # 配置预处理操作，这里使用了pad和resize，Ai2d支持crop/shift/pad/resize/affine，具体代码请打开/sdcard/app/libs/AI2D.py查看
    def config_preprocess(self, input_image_size=None):
        with ScopedTiming("set preprocess config", self.debug_mode > 0):  # 计时器，如果debug_mode大于0则开启
            ai2d_input_size = input_image_size if input_image_size else self.rgb888p_size  # 初始化ai2d预处理配置，默认为sensor给到AI的尺寸，可以通过设置input_image_size自行修改输入尺寸
            top, bottom, left, right = self.get_padding_param()  # 获取padding参数
            self.ai2d.pad([0, 0, 0, 0, top, bottom, left, right], 0, [104, 117, 123])  # 填充边缘
            self.ai2d.resize(nn.interp_method.tf_bilinear, nn.interp_mode.half_pixel)  # 缩放图像
            self.ai2d.build([1,3,ai2d_input_size[1],ai2d_input_size[0]],[1,3,self.model_input_size[1],self.model_input_size[0]])  # 构建预处理流程

    # 自定义当前任务的后处理，results是模型输出array列表，这里使用了aidemo库的face_det_post_process接口
    def postprocess(self, results):
        with ScopedTiming("postprocess", self.debug_mode > 0):
            post_ret = aidemo.face_det_post_process(self.confidence_threshold, self.nms_threshold, self.model_input_size[1], self.anchors, self.rgb888p_size, results)
            if len(post_ret) == 0:
                return post_ret
            else:
                return post_ret[0]

    # 绘制检测结果到画面上
    def draw_result(self, pl, dets):
        with ScopedTiming("display_draw", self.debug_mode > 0):
            if dets:
                pl.osd_img.clear()  # 清除OSD图像
                for det in dets:
                    # 将检测框的坐标转换为显示分辨率下的坐标
                    x, y, w, h = map(lambda x: int(round(x, 0)), det[:4])
                    x = x * self.display_size[0] // self.rgb888p_size[0]
                    y = y * self.display_size[1] // self.rgb888p_size[1]
                    w = w * self.display_size[0] // self.rgb888p_size[0]
                    h = h * self.display_size[1] // self.rgb888p_size[1]
                    pl.osd_img.draw_rectangle(x, y, w, h, color=(255, 255, 0, 255), thickness=2)  # 绘制矩形框
            else:
                pl.osd_img.clear()

    # 获取padding参数
    def get_padding_param(self):
        dst_w = self.model_input_size[0]  # 模型输入宽度
        dst_h = self.model_input_size[1]  # 模型输入高度
        ratio_w = dst_w / self.rgb888p_size[0]  # 宽度缩放比例
        ratio_h = dst_h / self.rgb888p_size[1]  # 高度缩放比例
        ratio = min(ratio_w, ratio_h)  # 取较小的缩放比例
        new_w = int(ratio * self.rgb888p_size[0])  # 新宽度
        new_h = int(ratio * self.rgb888p_size[1])  # 新高度
        dw = (dst_w - new_w) / 2  # 宽度差
        dh = (dst_h - new_h) / 2  # 高度差
        top = int(round(0))
        bottom = int(round(dh * 2 + 0.1))
        left = int(round(0))
        right = int(round(dw * 2 - 0.1))
        return top, bottom, left, right

if __name__ == "__main__":
    # 显示模式，默认"hdmi",可以选择"hdmi"和"lcd"
    display_mode="hdmi"
    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # 设置模型路径和其他参数
    kmodel_path = "/sdcard/app/tests/kmodel/face_detection_320.kmodel"
    # 其它参数
    confidence_threshold = 0.5
    nms_threshold = 0.2
    anchor_len = 4200
    det_dim = 4
    anchors_path = "/sdcard/app/tests/utils/prior_data_320.bin"
    anchors = np.fromfile(anchors_path, dtype=np.float)
    anchors = anchors.reshape((anchor_len, det_dim))
    rgb888p_size = [1920, 1080]

    # 初始化PipeLine，用于图像处理流程
    pl = PipeLine(rgb888p_size=rgb888p_size, display_size=display_size, display_mode=display_mode)
    pl.create()  # 创建PipeLine实例
    # 初始化自定义人脸检测实例
    face_det = FaceDetectionApp(kmodel_path, model_input_size=[320, 320], anchors=anchors, confidence_threshold=confidence_threshold, nms_threshold=nms_threshold, rgb888p_size=rgb888p_size, display_size=display_size, debug_mode=0)
    face_det.config_preprocess()  # 配置预处理

    clock = time.clock()

    try:
        while True:

            os.exitpoint()                  # 检查是否有退出信号

            clock.tick()

            img = pl.get_frame()            # 获取当前帧数据
            res = face_det.run(img)         # 推理当前帧

            # 当检测到人脸时，打印结果
            if res:
                print(res)

            face_det.draw_result(pl, res)   # 绘制结果
            pl.show_image()                 # 显示结果
            gc.collect()                    # 垃圾回收

            print(clock.fps())              #打印帧率

    except Exception as e:
        sys.print_exception(e)                  # 打印异常信息
    finally:
        face_det.deinit()                       # 反初始化
        pl.destroy()                            # 销毁PipeLine实例

First, a large number of library files were introduced into the code, including AI-related libraries and some basic library files.

Then, a face detection class is defined. A total of five methods are created in this class: initialization, configuration preprocessing, postprocessing, display results, and getting padding parameters.

Method - Initialization: In this method, the base class constructor is mainly used for construction, and then the initialization parameters are passed into the created instance. These parameters include the model path, model resolution, etc.

Method - Configure preprocessing: In this method, some Ai2d related functions are mainly called, and operations such as crop/shift/pad/resize/affine are performed, which are to perform some cropping, translation, etc. on the collected images. In this experiment, pad and resize operations are mainly performed. In this method, the method get_padding_param to be defined later is called to obtain some parameters for padding, and then padding is performed using these parameters. After resizing and scaling the image to a suitable size, build preprocessing, thus completing some preprocessing work of the image.

Method - Post-processing: This method mainly calls a face detection interface in the aidemo library, passes in some threshold information for detection, and returns data if there is relevant data, otherwise it returns nothing.

Method - Display results: This method is to detect whether there is a detected face object. If so, it starts to clear the current OSD, that is, what is displayed on the screen, and then converts the returned data into the corresponding data displayed on the screen, and calls pl, that is, the method related to drawing images in PineLine, to mark the detection results on the image for display.

Method - Get padding parameters: This method is mainly used to provide some parameters for configuring the pad in preprocessing. Here, the height and width of the model input are obtained, and then the height and width are scaled and processed. The sorted data is passed to the preprocessing process to implement a pad operation.

After completing the creation of the face detection class, we will start the instance and call the corresponding AI-related operations to start recognition and detection.

The first step is to give some parameters some certain values, such as the model path, set threshold, display size, etc.

Then first instantiate a PineLine, called pl, and create it. This ensures that image acquisition, display, and drawing can be achieved normally.

The next step is to instantiate a face detection class. Here you need to pass in the parameters in the init method and then perform preprocessing configuration. This preprocessing configuration actually uses two methods, the second and fifth methods.

Then we start to acquire and reason about the image.

First, use the get frame method in pl to get an image frame, pass it to img, and pass img to the face detection class in the previous instance. The method used here is run. Obviously, this method is not defined because this run is inherited from the AIBase parent class. Start reasoning the current frame.

When a face is detected, the result will be printed out. Then execute the fourth method in the face detection class, which is to draw the result. At the same time, call pl to display the result.

This completes the AI face detection process.

Let's demonstrate the effect below. Here I searched a picture using my phone, and I can see that the three people in the photo were accurately identified and accurately selected using a frame.

Later, we can carry out more AI vision-related development and learning based on this.

Palm detection

By detecting and identifying the palm, gesture recognition can be realized, so that gestures can be used to implement some control operations.

The recognition process of palm detection and face detection is actually similar. Define a new class in the palm detection program, implement pre-processing, configuration and other related functions in it, then instantiate the class in the main function, and then initialize the function to operate.

Compared with face detection, the first thing that needs to be changed is the model. Different recognitions require different models. Here I use the official model.

Then there is another parameter called anchors, which is called anchor point in Chinese. Its meaning is to store a set of preset borders. When using it, the parameters in anchors will be used first to draw the borders, and some offsets will be made on this basis. Therefore, the more parameters in anchors, the more situations that can be detected. However, the setting of its parameters should also be reasonable, and the length and width of the border should be roughly allocated according to the shape of the palm. The parameters provided in the routine are [26,27, 53,52, 75,71, 80,99, 106,82, 99,134, 140,113, 161,172, 245,276]. When it is reduced to only 26 and 27 parameters, it will be found that it is difficult to recognize the existence of the hand only at a specific angle. Therefore, more parameters in this list will increase the probability of recognition, but at the same time it may also increase the probability of false detection.

Below is the main code section for palm detection.

if __name__=="__main__":
    # 显示模式，默认"hdmi",可以选择"hdmi"和"lcd"
    display_mode="hdmi"
    if display_mode=="hdmi":
        display_size=[1920,1080]
    else:
        display_size=[800,480]
    # 模型路径
    kmodel_path="/sdcard/app/tests/kmodel/hand_det.kmodel"
    # 其它参数设置
    confidence_threshold = 0.2
    nms_threshold = 0.5
    rgb888p_size=[1920,1080]
    labels = ["hand"]
    anchors = [26,27, 53,52, 75,71, 80,99, 106,82, 99,134, 140,113, 161,172, 245,276]   #anchor设置

    # 初始化PipeLine
    pl=PipeLine(rgb888p_size=rgb888p_size,display_size=display_size,display_mode=display_mode)
    pl.create()
    # 初始化自定义手掌检测实例
    hand_det=HandDetectionApp(kmodel_path,model_input_size=[512,512],labels=labels,anchors=anchors,confidence_threshold=confidence_threshold,nms_threshold=nms_threshold,nms_option=False,strides=[8,16,32],rgb888p_size=rgb888p_size,display_size=display_size,debug_mode=0)
    hand_det.config_preprocess()

    clock = time.clock()

    try:
        while True:

            os.exitpoint()                          # 检查是否有退出信号

            clock.tick()

            img=pl.get_frame()                      # 获取当前帧数据
            res=hand_det.run(img)                   # 推理当前帧
            hand_det.draw_result(pl,res)            # 绘制结果到PipeLine的osd图像
            print(res)                              # 打印结果
            pl.show_image()                         # 显示当前的绘制结果
            gc.collect()                            # 垃圾回收

            print(clock.fps()) #打印帧率

    except Exception as e:
        sys.print_exception(e)
    finally:
        hand_det.deinit()                               # 反初始化
        pl.destroy()                                    # 销毁PipeLine实例

The following is a demonstration of the effect.

Gesture judgment

After palm recognition is implemented, each finger needs to be drawn. As shown in the figure below, the bending angles of the five fingers are obtained and the values can be returned to a list.

The following code returns the angles in a list.

for i in range(5):
    angle = self.hk_vector_2d_angle([(results[0]-results[i*8+4]), (results[1]-results[i*8+5])],[(results[i*8+6]-results[i*8+8]),(results[i*8+7]-results[i*8+9])])
    angle_list.append(angle)

Then, each bit in the angle_list list is individually judged in angle, so that gesture recognition can be realized.

From 0 to 5, they correspond to thumb, index finger, middle finger, ring finger, and little finger.

The parameter thr_angle is the bending angle threshold, that is, if the value exceeds this value, the finger is considered to be bent.

The parameter thr_angle_thumb is the thumb bending angle threshold, that is, if this value is exceeded, the thumb is considered to be bent.

The parameter thr_angle_s is the relaxation angle threshold, that is, if the value is smaller than this value, the finger is considered to be in the extended state.

The parameter gesture_str is a gesture string, which is used to store the name of the gesture and is returned for display.

Let's analyze the gun and love gestures below.

In the gun gesture, the thumb and index finger are in an extended and relaxed state, and the remaining three fingers are in a bent state. Therefore, the condition for the thumb and index finger should be less than thr_angle_s, and the other three fingers should be greater than thr_angle. It can be seen that the first two judgment conditions in the list in the code are less than thr_angle_s, and the last three judgment conditions are greater than thr_angle.

In the love gesture, the thumb, index finger and pinky are in a stretched and relaxed state, and the remaining two fingers are in a bent state. Therefore, the condition for the thumb, index finger and pinky should be less than thr_angle_s, and the other two fingers should be greater than thr_angle. It can be seen that the judgment condition for the first two and last digits in the list in the code is less than thr_angle_s, and the judgment condition for the middle two digits is greater than thr_angle.

thr_angle,thr_angle_thumb,thr_angle_s,gesture_str = 65.,53.,49.,None
            if 65535. not in angle_list:
                if (angle_list[0]>thr_angle_thumb)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
                    gesture_str = "fist"
                elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]<thr_angle_s):
                    gesture_str = "five"
                elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
                    gesture_str = "gun"
                elif (angle_list[0]<thr_angle_s)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
                    gesture_str = "love"
                elif (angle_list[0]>5)  and (angle_list[1]<thr_angle_s) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
                    gesture_str = "one"
                elif (angle_list[0]<thr_angle_s)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]<thr_angle_s):
                    gesture_str = "six"
                elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]>thr_angle):
                    gesture_str = "three"
                elif (angle_list[0]<thr_angle_s)  and (angle_list[1]>thr_angle) and (angle_list[2]>thr_angle) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
                    gesture_str = "thumbUp"
                elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]>thr_angle) and (angle_list[4]>thr_angle):
                    gesture_str = "yeah"
            return gesture_str

Next, we try to add an OK gesture. The OK gesture is to bend the thumb and index finger, and the other three fingers are relaxed. However, after testing, it is found that the thumb needs to be bent at a large angle to achieve recognition. Therefore, we change the set thr_angle_thumb threshold. By reducing this threshold, better recognition can be achieved.

elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]>thr_angle) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]<thr_angle_s):
    gesture_str = "OK"

Because gesture recognition relies on angles, it may not be accurate for hand flipping and rotation. Let's add a Four gesture. In the Four gesture, the thumb is bent and the remaining four fingers are stretched and relaxed.

elif (angle_list[0]>thr_angle_thumb)  and (angle_list[1]<thr_angle_s) and (angle_list[2]<thr_angle_s) and (angle_list[3]<thr_angle_s) and (angle_list[4]<thr_angle_s):
    gesture_str = "four"