Orbbec helps robot customers quickly realize innovative application development and mass production-EEWORLD

Collect

It's time to give the big model a body.

Recently, the R&D team of Orbbec has combined the arm with a large model, using the speech, language, and vision-language large model, supplemented by data input from the bbec Gemini 2 series depth camera, to create a robotic arm that can understand and perform voice tasks.

The project is based on the body created by Professor Fei-Fei Li's team at Stanford University. By solving a series of engineering problems such as generalization, observation, and control, the project will bring the robotic arm based on the multimodal large model from the environment to the real world, expanding the application potential of the intelligent robotic arm.

Integrate multiple large model capabilities

Let the robot arm understand and execute voice commands

Since last year, the emergence of various large models has triggered a new wave of development in the robotics industry. Although "large models + robots" are still in the early stages of technical exploration, with the in-depth integration of the two, robots are expected to have a smarter "brain", with more powerful "eyes" and "body", and achieve the evolution towards embodied intelligence.

The large-model robotic arm created by Orbbec can use voice prompts as input, and utilize the understanding and visual perception capabilities of multiple large models to generate spatial semantics, allowing the robotic arm to understand and execute actions.

First, the robot arm can use the voice big model to recognize the voice commands of the task issuer; at the same time, it can obtain high-quality environmental RGB and Depth data through two Orbbec Gemini 2 binocular structured light cameras; then use the SAM, CLIP and other visual-language big models to understand the scene information, conduct real-time collisions, and finally execute the task.

Based on this principle, Orbbec can enable the robotic arm to complete a series of instructions, such as:

Please remember the current status

Put the red cube into the yellow box

Put the green square into the white box

Rotate the blue block 30° counterclockwise

Move the blue block 10cm towards the green block

Put the blue block on top of the green block

Please restore the original state

Please put all the blocks into the yellow box

At present, the project has established a baseline for the application and deployment of multi-modal large models on robotic arms in the 1.0 stage. Orbbec is further optimizing multi-modal command understanding, multi-fusion perception, robotic arm trajectory planning and control, and end-grasping control. In the future, it will launch large-model robots to make them more intelligent and flexible and adapt to more complex operation scenarios.

Overcoming the generalization, observation, and control challenges

From simulation to reality

At present, many studies on robot agents at home and abroad are mostly completed in simulation environments. From virtual simulation to the real world, a series of engineering implementation problems need to be overcome. For example, in a simulation environment, the camera is based on an ideal imaging model and is not affected by imaging distortion, ambient lighting, etc. This poses a challenge to the generalization ability of the agent in real scenes.

Based on the pre-trained multi-modal robotic arm model, the R&D team of Orbbec overcame a series of implementation difficulties such as generalization, observation, and control:

In order to achieve fast and accurate voice input and understanding, a large voice pre-training model is introduced to enable the robotic arm to respond sensitively to voice commands.

To ensure that the robotic arm has sufficient generalization capabilities in the real world, a large vision-language model is used to enable the robotic arm to understand and adapt to complex scenarios and perform tasks robustly in diverse environments.

In order to deal with the ideal camera imaging problem of the pre-trained model, a new calibration scheme is designed to optimize the camera automatic exposure (AE) strategy to solve the challenges brought by factors such as ambient light, imaging distortion, and perspective deformation, making the robotic arm more robust.

In order to improve the safety of the robotic arm in complex environments, depth camera collision detection and grasping correction are introduced to optimize the robotic arm control and improve the performance, accuracy and adaptability of the robotic arm grasping scenarios.

Based on the introduction and innovation of key technologies, Orbbec has successfully overcome the difficulties of multimodal robotic arms in multiple cross-fields and opened up the "last mile" for the implementation of engineering applications.

In the field of robot vision, Orbbec has more than 8 years of industry experience and has served more than 100 robot industry companies. Through years of cooperation, Orbbec has accumulated rich experience in robot 3D sensors, lidar, models, etc., helping robot customers to quickly realize innovative application development and mass production.

Laying out a large multimodal visual model

Potential applications of dimensional robots

In what scenarios can a robotic arm that integrates multiple large model capabilities be used?

As the robot's "eyes" (visual sensors), "brain" (large model) and "body" (ontobody) continue to develop and evolve, intelligent robots and robotic arms are expected to be first implemented in scenarios such as manufacturing, flexible logistics, and commercial services.

For example, in an automated factory scenario, a robotic arm based on a multimodal large model can be combined with an unmanned vehicle to perform intelligent sorting and transportation; in a home service robot scenario, people can use simple natural language commands to let the robot help pour water or pick up express deliveries.

Currently, for the robotics industry, Orbbec can provide 3D vision sensors with full technical routes such as monocular structured light, binocular structured light, iToF, LiDAR, dToF, etc., and provide multi-sensor fusion support. At the same time, in response to the development trends of large models and embodied intelligent robots, Orbbec is committed to building a robot and AI vision middle platform. Through the research and development of multimodal visual large models and intelligent algorithms, combined with robot vision sensors, it will form a complete autonomous mobile positioning navigation and obstacle avoidance, providing a full-range capability platform and series solutions for downstream customers in the entire industry to welcome the era of intelligent robots.

Reviewing Editor: Peng Jing

Reference address：Orbbec helps robot customers quickly realize innovative application development and mass production

Previous article：AI is beginning to transform robots. What will the future look like?
Next article：Slamtec won the "China Robot Sensor Innovation Application Award"

Popular Resources
Popular amplifiers

Latest robot Articles

China's robot density has surpassed Germany and Japan
FRANKFURT, December 20, 2024 – China’s robot adoption continues to accelerate: in terms of the ratio of robots to factory workers, China has surpassed Germany and Japan and will become the world’s largest by 2023. ...
Molex releases new report on the future of robotics and explores the huge potential of human-robot collaboration
Advances in artificial intelligence, machine learning, and sensor fusion technologies have promoted the expansion of robots' functions in many fields such as factory automation, home services, educational assistance, medical care, and military applications. ...
my country is developing a six-legged moon landing robot: no worries even if the legs are broken
On November 18, CCTV reported that my country's scientific research team is currently developing a six-legged lunar robot. Compared with the four-legged robot, the six-legged robot is ...
Using IMU to enhance robot positioning: a fundamental technology for accurate navigation
Abstract This article highlights the importance of inertial measurement unit (IMU) sensors for robot positioning and outlines their main advantages. IMUs provide critical motion data and have become an essential component of accurate robot positioning. ...
Researchers develop self-learning robot that can clean washbasins like humans
On November 10, researchers at the Vienna University of Technology (TU Wien) developed a self-learning robot that can imitate humans to complete simple tasks, such as cleaning a washbasin. ...
Universal Robots launches UR AI Accelerator to inject new AI power into collaborative robots
The first batch of national standards for embodied intelligence of humanoid robots were released: divided into 4 levels according to limb movement, upper limb operation, etc.
New chapter in payload: Universal Robots’ new generation UR20 and UR30 have upgraded performance
Humanoid robots drive the demand for frameless torque motors, and manufacturers are actively deploying

MoreSelected Circuit Diagrams

Change More Related Popular Components

MorePopular Articles

MoreDaily News

Guess you like