Gesture recognition based on real-time interaction of embedded systems-EEWORLD

Collect

Gesture interaction has been a hot topic in the field of human-computer interaction in recent years. In particular, it uses cameras to capture gesture information in a non-contact manner, and the computer analyzes and understands it, and then completes the interactive task. It is popular because of its natural and consistent interaction method with people's own behavioral habits. The changes in the gesture form during the interaction process and the interference of the surrounding environment will affect the recognition and understanding of gestures. Therefore, gesture recognition is an important issue in the field of computer vision and human-computer interaction. How to better apply this interaction method in embedded systems is a challenging task.

The process of gesture recognition based on vision is usually divided into four steps, namely segmentation, representation, recognition and application. The key and difficulty of gesture recognition algorithm is the two steps of segmentation and recognition. The existing algorithms usually have the characteristics of large amount of calculation and high time complexity in these two steps. In addition, embedded devices are limited by resources and computing power. In order to achieve real-time gesture interaction based on embedded systems, it is necessary to improve the traditional gesture recognition algorithm.

Under the condition of a single camera, this paper proposes a gesture recognition method based on gesture structural features based on the related work of gesture tracking, so as to meet the requirements of real-time, accuracy and continuity of human-computer interaction in embedded systems. This paper uses the Camshift algorithm with low computational complexity and high performance as the tracking algorithm, and uses its tracking results as the reference factor for gesture recognition, which can greatly reduce the workload of gesture recognition; gesture recognition adopts a processing method that combines gesture tracking results with gesture morphological structural features. Using the results of gesture tracking as a reference factor, the background image in the image that is irrelevant to the gesture can be removed. The gesture morphological structural features are used to make the gesture recognition work not process each point on the edge of the gesture, but process the circumscribed polygon of the gesture. The combination of these two methods not only greatly reduces the computational complexity of the recognition work, but also improves the accuracy of gesture recognition. Moreover, the recognition work can be completed without training various gestures, making the recognition more convenient and concise.

1 Related Work

Regarding gesture recognition algorithms, researchers at home and abroad have proposed many different solutions. Currently, the more commonly used ones are statistical HMM models, genetic algorithms, and artificial neural network -based gesture recognition. The advantage of the statistical HMM method is that it uses prior knowledge to establish causal relationships between visual features to deal with the inherent uncertainty problems in video processing. It can not only probabilistically model the dependencies between different features corresponding to multiple random variables at each moment, but also consider the transition probabilities between each moment, which can well reflect the temporal relationship between features. However, it needs to maintain a sample library of a certain size, and the amount of calculation is large when using HMM for gesture recognition. Of course, the larger the sample library, the closer its distribution is to the actual situation, the higher the accuracy of gesture recognition, and it is also necessary to use data smoothing technology to expand the value of small probability. The genetic algorithm discretizes the image, controls the discrete points of the image, and converts the image recognition problem into a combinatorial optimization problem of a series of discrete points; but it cannot use the feedback information of the network in a timely manner, the search speed is relatively slow, and the required training samples are large and the training time is long. Artificial neural networks form a complex information processing network by extensively connecting a large number of simple processing units (neurons). It imitates the information processing, storage and retrieval functions of the human brain's nervous system to varying degrees and levels. It requires few samples and has high efficiency; however, it requires human participation in training, and the accuracy of recognition is affected by subjective factors.

In general, in human-computer interaction systems, gesture tracking and recognition should meet the following requirements:

a) Good real-time performance, avoiding the calculation of high-dimensional feature vectors, large array processing and complex search processes.

b) Sufficient robustness: It is not affected by the rotation, translation, and scale changes of the recognized object and the changes of the camera perspective.

c) Continuity and automatic initialization of gesture tracking, which can automatically resume tracking after tracking failure, minimizing human intervention.

The gesture recognition and tracking method proposed in this paper no longer follows the traditional method of isolating the four steps of recognition, but instead links the results of gesture tracking with the two independent steps of gesture recognition, and sets the predicted area of the gesture obtained by tracking as the region of interest (ROI) for the recognition of the next frame of image; based on the Camshift algorithm, the position of the gesture in the next frame of image is predicted according to the position and color information of the gesture in the previous frame of image, mainly based on the statistical information of color. It has a small amount of computation, which not only meets the needs of embedded systems well, but also has very good tracking and prediction effects. By segmenting and recognizing gestures in the ROI area, some interference of the background image on the gesture can be eliminated, and the computational complexity of the recognition process is also greatly reduced. Since the edge lines of each gesture have different features, these different features can be well reflected in the circumscribed polygon of the gesture. Therefore, a one-to-one mapping relationship can be established between different gestures and circumscribed polygons; by establishing a circumscribed polygon feature library of different gestures, polygon fitting is performed on the segmented gestures. As long as the extracted polygons are matched with the circumscribed polygons in the feature library, the type of gesture can be determined.

The gesture recognition method proposed in this paper mainly consists of three parts:

a) Hand gesture segmentation. Segment the hand area from the scene and extract the area and contour of the hand. The hand area here is mainly provided based on the results of c) tracking.

b) Fitting and matching of the circumscribed polygon of the gesture image. Perform polygon fitting on the gesture contour extracted in a), analyze the shape characteristics of the polygon, and search for objects that match the fitted polygon features in the feature library, and then map them to specific gestures.

c) Gesture tracking part: The hand area is located according to the color information, and the image is spatially transformed. The statistical principle is used to predict the area where the hand may appear in the next frame, and the prediction result is fed back to the gesture segmentation part of a).

The gesture recognition process is shown in Figure 1.

2 Gesture Recognition Framework

Gesture recognition mainly consists of two parts: static gesture recognition and gesture tracking. The gesture recognition framework is shown in Figure 2. In the method proposed in this paper, the two parts are processed in parallel. The result of gesture recognition is passed to the tracking part as the tracking object, and the prediction result of gesture tracking is fed back to the recognition part to provide the ROI image area for static gesture recognition. This can not only effectively improve the efficiency of tracking, but also improve the accuracy of recognition, effectively unifying the two parts.

2.1 Static Gesture Recognition

Through the recognition of static gestures, the system can have a basic understanding of the tracked object, laying the foundation for the realization of automatic tracking initialization and automatic recovery of tracking. First, the hand area needs to be segmented from the scene. This paper adopts a method based on fuzzy sets and fuzzy operations to extract the area and contour of the hand. By performing fuzzy operations on the background, motion, skin color and other information in the spatial and temporal domains of the video stream, the accurate human hand is segmented.

The recognition of static gestures is based on the recognition of contour features. The segmented human hand is detected by edge detection to obtain the complete contour edge of the gesture. Through the previous fuzzy set operation, a binary image of the gesture segmentation of the image can be obtained. There is always an edge between two adjacent areas with different gray values. The edge is the result of discontinuity of gray values, which can be easily detected by derivative.

In this way, a complete contour edge can be obtained. As shown in Figure 3, the left side is the hand area and the right side is the contour of the gesture.

Next, the extracted gesture contour is fitted with a circumscribed polygon. Kenji Oka and Yoichi Sato 's fingertip search method first scans within a large search window to determine 20 candidate fingertip positions, then suppresses the candidates around the candidate position with the highest matching degree, and removes some candidates located in the middle of the fingertip according to certain rules. This method requires multiple pixel-by-pixel scans of the search area, resulting in a large amount of calculation, and the robustness of the method for removing the candidate position in the middle of the gesture is poor. Reference [5] provides a method for searching for intermediate positions by traversing the curvature of the gesture contour. By performing a fixed-length scan of the gesture in the order of the contour, the fingertip can be found and the circumscribed polygon of the contour line can be made. However, this search method requires traversing each point of the contour line, and a division operation is required for each point, which makes the algorithm too computationally intensive. In addition, when searching for fingertips, when the contour line has many protruding edges due to changes in light, recognition becomes difficult. This paper proposes a search method for finding circumscribed edges. By performing a fixed-length scan on the gesture contour in the order of contour points, the circumscribed polygon of the gesture contour is fitted. At the same time, the convex edge defect structure of the gesture contour that meets Definition 1 is set as the judgment feature of gesture recognition.

2.1.1 Gesture defect diagram

Definition 1: A gesture defect map is a feature description equation consisting of a polygon circumscribing the gesture contour and the depth points corresponding to each edge of the polygon. The depth point is the contour point on the contour line corresponding to the edge of the circumscribing polygon that is farthest from the edge. The data structure of the gesture defect map is defined as follows:

Typedef struct CvConvexityDe fec t{

CvPoint* start; //The contour point where the defect starts

CvPoint* end; //defect end contour point

CvPoint* de pth_point; //The contour point farthest from the convexity in the defect

Float depth; //The depth of the valley from the convexity

} CvConvexityDefect;

As shown in Figure 4, the gesture contour defect map can well describe various gestures. The gesture defect map can be mapped to different gestures by the number of edges of the circumscribed polygon of the gesture contour and the valley depth corresponding to the edge. Among them, A, B, C, D, E, F, G are the edges of the circumscribed polygon of the gesture contour, and Da, Db, Dc, Dd, De, Df, Dg are the depths from the valley to the corresponding edge in the gesture defect map.

In order to obtain the gesture contour defect map, the gesture contour must first be fitted with a polygon to obtain its circumscribed polygon. This paper proposes a method to fit the gesture contour based on the concavity and convexity between two adjacent points on the gesture contour. By traversing the points on the contour once, the following equation is judged, and appropriate points are eliminated. The remaining points are the candidate fixed points of the circumscribed polygon:

by = next y - cur y ( 1)

ay × bx - ax × by ( 2)

ax = pcur. x - pprev. x, ay = pcur. y - pprev. y

bx = pnext.x - pcur.x, by = pnext.y - pcur.y

Where: pcur is the point on the contour line currently traversed; pprev and pnext represent the previous and next points of the current point respectively; ax and ay are the x and y coordinate value differences between the current point and the previous point respectively; bx and by are the x and y coordinate value differences between the current point and the next point respectively.

The fitting algorithm process based on the concave and convex shape of the contour line is as follows:

a) Sort all the points on the contour line by x-coordinate value, and find the maximum and minimum y-coordinate values maxY and minY of all the points.

b) Divide the sorted contour points into four parts: First, divide the contour line into two parts according to the y coordinate, and divide the upper part into two parts with the x coordinate of maxY (denoted as Xmaxy), denoted as topLeft〈upper left〉 and topRight〈upper right〉; divide the lower part into two parts with the x coordinate of minY (denoted as Xminy), denoted as bottomLeft〈lower left〉 and bottom-Right〈lower right〉.

c) Traverse the four parts (topLeft, topRight, bottomLeft, bottomRight) divided in the previous step respectively: For the topLeft area, remove the points that satisfy equation (1) < 0, equation (2) > 0; For the topRight area, remove the points that satisfy equation (1) < 0, equation (2) < 0; For the bottomLeft area, remove the points that satisfy equation (1) > 0, equation (2) > 0; For the bottomRight area, remove the points that satisfy equation (1) > 0, equation (2) > 0. The remaining points after removal are the vertices of the circumscribed polygon of the gesture contour.

The solution of the valley bottom and valley bottom depth of the gesture defect map is based on the fitting of the circumscribed polygon. It is also necessary to traverse the contour lines corresponding to each edge of the circumscribed polygon again, and the maximum value that satisfies the following equation is the valley bottom corresponding to the edge:

Among them: scale is the normalized value; hull_cur and hull_next are the current traversed edge and the next edge of the circumscribed polygon respectively; dx0 and dy0 are the differences between the x and y coordinates of the current edge of the circumscribed polygon respectively; dx and dy are the differences between the x and y coordinates of the point on the currently traversed contour line and the hull_cur point respectively; depth is the distance between the traversed point and the corresponding edge, and its maximum value is the valley depth corresponding to the edge, and the corresponding point is the valley bottom.

The above search can be used to find the eigenvalues of the gesture contour defect map. Next, the eigenvalues of the defect map (the relationship between the polygon and the valley bottom) can be compared with the eigenvalues in the established library to match the gestures and map the gesture contour defect map to different gestures.

2.1.2 Gesture Matching

The matching of gestures is mainly based on the matching of gesture defect maps. The feature values of the gesture defect map are composed of the circumscribed polygon and the position and depth of the valley bottom, as shown in Figure 5.

The number of fingers can be determined based on the number of sides of the polygon and the degree of each side, while the relationship and position of the fingers can be determined based on the depth and position of the valley. Since this is analyzed based on the overall image of the gesture, it has a certain degree of robustness. When the gesture graph is different due to changes in light, it will not cause changes in the gesture defect graph.

2.2 Gesture Tracking

For hand tracking, it is mainly based on the Camshift algorithm, which makes comprehensive use of the color, area and contour features of the gesture image. Camshift is a generalization of the Mean.Shift algorithm and is an effective statistical iterative algorithm that enables the target point to drift to the local maximum point of the density function. The Camshift tracking algorithm is a tracking method based on the color probability model. After establishing the color histogram model of the tracked target, the video image can be converted into a color probability distribution map. The position and size of the search window in each frame of the image will be updated so that it can locate the center and size of the tracked target. In this paper, the Camshift algorithm is used for coarse positioning, that is, to determine the outer rectangle Rect of the current gesture area, as shown in Figure 6.

Rect will be used as the input image of the previous static gesture recognition to reduce the workload of image segmentation and blur operation.

3 Gesture Interaction Demonstration System

This paper implements the proposed method of gesture recognition based on gesture contour defect map under Linux system, and implements the human-computer interaction demonstration system of gesture recognition on the "Embedded Star" development board. The system processor is 800 MHz, the memory RAM is 256 MB, and the real-time acquisition of 640 × 480 true color images. The system analyzes each frame of the image captured by the camera in real time and recognizes the gestures in the image in real time. The application of the system is a jigsaw puzzle game based on gesture recognition, which is completed by changing gestures. The recognizable static gesture states are divided into five categories: A is a fist, B is an index finger extension, C is a V-shaped gesture, D is the middle three fingers extended, and E is five fingers open. A and E hand shapes correspond to grasping and releasing respectively. When the gesture is in the fist state, the picture block corresponding to the hand is selected (similar to pressing the left mouse button). At this time, the picture can be dragged. After selecting the position, it can be changed to gesture E, and the picture will be released (similar to releasing the left mouse button). B, C, and D gestures respectively represent zooming in, zooming out, and rotating the picture.

The effect of the demonstration system is shown in Figure 7.

4 Experimental results and analysis

In order to verify the accuracy and real-time performance of the proposed algorithm, monocular video images of gestures without any special markings were collected under laboratory lighting conditions. The parameters in the experiment were set as follows: The maximum number of iterations of the Camshift algorithm was 10; The HSV color space used for gesture segmentation is shown in Table 1.

Table 1 HSV color space settings

The mathematical morphological operation performed on the binary gesture image uses a 3 × 3 template for opening and a 5 × 5 template for closing. The domain value of the noise gesture is set to 0.01. There is no human intervention in the gesture tracking process.

Keywords：Embedded Reference address：Gesture recognition based on real-time interaction of embedded systems

Previous article：Parallel programming using multi-core processors enables video transcoding
Next article：Design of wireless touch mouse pad based on USB

Recommended ReadingLatest update time:2024-11-17 00:48

Qt/Embedded/battery status detection control software design

1 Introduction The detection of battery status parameters in the battery production process is the key to ensure the quality of the battery. However, at present, the domestic battery status detection mainly relies on instruments such as battery voltage inspection meter, battery conductivity tester a

[Power Management]

Qt/Embedded/battery status detection control software design

Popular Resources
Popular amplifiers