A system solution that enables the robot arm to complete autonomous grasping based on machine vision guidance-EEWORLD

Collect

Abstract: In order to reduce the execution time of the robot in the process of product classification and grasping, reduce the positioning error, and improve production efficiency. In view of the limitation that traditional robots can only perform predefined trajectory tasks, this paper proposes a robot grasping solution combined with a visual recognition system. The visual recognition system in this solution can simultaneously detect the color and shape attributes of the target to be grasped, and use the template matching method to match the contour of the target object; and the execution device for performing the grasping task uses a six-axis robot xArm; finally, the Eye-in-Hand vision-robot solution is used to realize the recognition and grasping of multiple objects, and palletizing according to the set rules. The test results verify the effectiveness and robustness of the proposed solution.

Introduction
In recent years, with the in-depth research of human beings in the fields of industrial modernization production, deep-sea exploration, space exploration, environmental monitoring, telemedicine, smart home, etc., as well as the diversification of application needs.

High-speed, flexible, multi-degree-of-freedom robotic arms, like human arms, are gradually playing an important role in production and life. The development of robotic arm control technology towards intelligence has also become an inevitable trend.

In addition, vision is an important means for humans to observe and perceive the surrounding environment. According to statistics, among the information received by humans, the proportion of information obtained through naked eye observation is as high as 75%, which fully demonstrates the importance of human visual function. Combining machine vision technology with robotic arm control can solve the problem that the robotic arm needs to automatically distinguish and is stretched when performing certain autonomous work tasks, as well as the problem of insufficient flexibility of the robotic arm in complex working environments and work tasks, which essentially improves the autonomous ability and intelligence of the robotic arm, and enhances the adaptability and practicality of the robotic arm. To this end, this paper proposes a robotic arm intelligent grasping system based on machine vision: using visual sensors instead of human eyes for measurement and judgment technology, using computers to simulate human recognition standards, to analyze and understand images, and to find target objects and target locations, after clarifying the work goals, sending corresponding analysis results to the robotic arm, and controlling the robotic arm to complete autonomous grasping tasks.

Intelligent grabbing system composition

2.1 Hardware System

As shown in Figure 2.1, the hardware equipment of the intelligent grasping system is mainly composed of four parts: a computer, a robotic arm system, a visual system fixed to the end of the robotic arm (Eye-in-Hand), and an end effector. Compared with the Eye-to-Hand installation method, the Eye-in-Hand installation method allows the field of view of the intelligent grasping system to change with the movement of the robotic arm, which can increase the range of visual recognition. When the intelligent grasping system is performing a grasping task, after the visual system recognizes the position and posture of the grasping target, it transmits the data to the computer for processing (coordinate transformation, etc.), and finally the robotic arm system grasps and places the target object.

Figure 2.1 Composition of intelligent grasping system

The robotic arm system consists of a controller, a six-axis robotic arm, and an end effector (suction cup). The tasks that a robotic arm can accomplish vary greatly depending on its own structure, so this experiment uses an xArm 6-axis robotic arm (with a large enough workspace) to meet the requirements of this grasping task. In addition, the robotic arm can recognize controller instructions to realize the motion planning of the robotic arm and the switch of the end effector (suction cup). The visual system uses the Intel RealSense D415 visual sensor to realize image acquisition and recognition and positioning functions. Realsense D415 is a depth camera launched by Intel in 2018. It has a field of view of approximately 70 degrees and a 2-megapixel imager. Compared with other cameras with the same characteristics, D415 has the advantages of light weight, high precision, easy calibration, and low cost.

2.2 Software Framework

The software framework of the intelligent grasping system is mainly based on the robot operating system ROS.

ROS is an open source meta-operating system suitable for robot development, which mainly works in Linux environment. It integrates a large number of tools, libraries, and protocols, and provides the services that an operating system should have, including hardware abstraction, underlying device control, inter-process messaging, and package management, which greatly reduces the threshold for robot development. This article uses ROS as the basic software platform for operating the intelligent grasping system and strings together all the steps in the operation process, as shown in Figure 2.2.

Figure 2.2 Intelligent grabbing system software framework

Visual system part: The target recognition and positioning functions are developed using the computer vision library OpenCV (OpenCV is a cross-platform computer vision and machine learning software library that can run on multiple operating systems such as Linux. It has the advantages of being lightweight and efficient.) This project uses OpenCV to recognize and locate target objects based on the ROS platform under the Linux system (introduced in the next chapter).

Robotic arm control system: During the motion planning of the robotic arm, communication between all nodes is completed through a dedicated ROS topic. To control the robotic arm to grasp the target, it is necessary to run the inverse kinematics algorithm node, read the position that the end effector should reach, and calculate the required joint angle position. The above control information is published in a separate topic and read by the controller to realize the movement of the robotic arm. (For detailed theoretical knowledge about the forward and inverse kinematics of the robotic arm, see Chapter 3 and Chapter 4 of "Introduction to Robotics" in the RobotTechBook branch of the Zippen-Huang/RobotTechCooker github repository. The kinematic model analysis of the robotic arm will also be arranged later).

In summary, after identifying and locating the target object, the visual system can realize motion planning tasks such as grasping and placing of the robotic arm by publishing topic services.

Image recognition and positioning analysis

For a robotic gripping system based on machine vision, its primary task is to accurately identify and locate the target object, which is an important factor affecting the final grasping success rate. In order to accurately obtain the coordinates of the target in three-dimensional space, the main process of target object recognition and positioning is shown in Figure 3.1, which is analyzed in detail below.

Figure 3.1 Identification and positioning process

3.1 Image Preprocessing

Due to the complexity of the environmental background, there will be a lot of irrelevant interference information in the collected images, which will have a certain degree of impact on image processing and recognition. Image preprocessing can effectively filter out the interference information in the image and highlight the image feature information required for recognition.

Therefore, designing an image preprocessing algorithm to preprocess the images collected by Intel RealSense D415 can effectively highlight the feature parameters of the target objects to be detected under the environmental background, reduce the complexity of the target recognition algorithm, and improve the recognition and detection efficiency.

3.1.1 Image grayscale

Image grayscale can be simply described as converting a color image into a grayscale image, reducing unnecessary features brought by the channel, reducing data dimension, and making processing more efficient. Therefore, image grayscale is beneficial for extracting target object features and segmenting target images, and is widely used in the field of machine vision.

3.1.2 Image Filtering

The main purpose of filtering is to suppress the interference noise in the image while maintaining the original details of the image as much as possible. Its effect will have a direct impact on the subsequent image processing, so choosing a suitable filtering method is crucial.

Commonly used image filtering methods include: median filtering, Gaussian filtering, KNN filtering, maximum uniformity smoothing filtering, etc. In target detection applications, the three widely used methods for image noise reduction are median filtering, mean filtering and Gaussian filtering: Among them, median filtering is a nonlinear filtering method. The central idea of this method is to replace the original point of a pixel with the median value in a specific area of a pixel point, so as to make the image processing effect smoother. Mean filtering is a representative of the classic linear filtering method. The central idea of this method is to replace the pixel value of a point in the image with the average value of the grayscale value of this point and the grayscale value of its adjacent pixels. Mean filtering often brings about the problem of poor image quality. The larger the neighborhood selected by the image, the worse the noise elimination effect. In layman's terms, Gaussian filtering is to obtain a new pixel point by weighted average of the value of the pixel point itself and the surrounding points. This method has a good performance in eliminating Gaussian noise. Through comparison, it is found that although the image after mean filtering can well retain the object information characteristics, the edge information is weakened, which is not conducive to the subsequent target object contour detection; and although the Gaussian filtering method can suppress Gaussian noise, it is too smooth, resulting in serious loss of image information after filtering and a blurred image; under the condition of high salt and pepper noise concentration in the experimental environment, the image after median filtering has the best effect, which effectively suppresses the noise interference of the image and avoids the loss of image edge information.

[1] [2] [3]

Reference address：A system solution that enables the robot arm to complete autonomous grasping based on machine vision guidance

Previous article：Mitsubishi PLC internal counter programming
Next article：How to achieve good drive in motor control

Popular Resources
Popular amplifiers