Orbbec helps robot customers quickly realize innovative application development and mass production

Publisher:painterLatest update time:2023-12-13 Source: 奥比中光Author: Lemontree Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

It's time to give the big model a body.

Recently, the R&D team of Orbbec has combined the arm with a large model, using the speech, language, and vision-language large model, supplemented by data input from the bbec Gemini 2 series depth camera, to create a robotic arm that can understand and perform voice tasks.

The project is based on the body created by Professor Fei-Fei Li's team at Stanford University. By solving a series of engineering problems such as generalization, observation, and control, the project will bring the robotic arm based on the multimodal large model from the environment to the real world, expanding the application potential of the intelligent robotic arm.

Integrate multiple large model capabilities

Let the robot arm understand and execute voice commands

Since last year, the emergence of various large models has triggered a new wave of development in the robotics industry. Although "large models + robots" are still in the early stages of technical exploration, with the in-depth integration of the two, robots are expected to have a smarter "brain", with more powerful "eyes" and "body", and achieve the evolution towards embodied intelligence.

The large-model robotic arm created by Orbbec can use voice prompts as input, and utilize the understanding and visual perception capabilities of multiple large models to generate spatial semantics, allowing the robotic arm to understand and execute actions.

First, the robot arm can use the voice big model to recognize the voice commands of the task issuer; at the same time, it can obtain high-quality environmental RGB and Depth data through two Orbbec Gemini 2 binocular structured light cameras; then use the SAM, CLIP and other visual-language big models to understand the scene information, conduct real-time collisions, and finally execute the task.

Based on this principle, Orbbec can enable the robotic arm to complete a series of instructions, such as:

Please remember the current status

Put the red cube into the yellow box

Put the green square into the white box

Rotate the blue block 30° counterclockwise

Move the blue block 10cm towards the green block

Put the blue block on top of the green block

Please restore the original state

Please put all the blocks into the yellow box

At present, the project has established a baseline for the application and deployment of multi-modal large models on robotic arms in the 1.0 stage. Orbbec is further optimizing multi-modal command understanding, multi-fusion perception, robotic arm trajectory planning and control, and end-grasping control. In the future, it will launch large-model robots to make them more intelligent and flexible and adapt to more complex operation scenarios.

Overcoming the generalization, observation, and control challenges

From simulation to reality

At present, many studies on robot agents at home and abroad are mostly completed in simulation environments. From virtual simulation to the real world, a series of engineering implementation problems need to be overcome. For example, in a simulation environment, the camera is based on an ideal imaging model and is not affected by imaging distortion, ambient lighting, etc. This poses a challenge to the generalization ability of the agent in real scenes.

Based on the pre-trained multi-modal robotic arm model, the R&D team of Orbbec overcame a series of implementation difficulties such as generalization, observation, and control:

In order to achieve fast and accurate voice input and understanding, a large voice pre-training model is introduced to enable the robotic arm to respond sensitively to voice commands.

To ensure that the robotic arm has sufficient generalization capabilities in the real world, a large vision-language model is used to enable the robotic arm to understand and adapt to complex scenarios and perform tasks robustly in diverse environments.

In order to deal with the ideal camera imaging problem of the pre-trained model, a new calibration scheme is designed to optimize the camera automatic exposure (AE) strategy to solve the challenges brought by factors such as ambient light, imaging distortion, and perspective deformation, making the robotic arm more robust.

In order to improve the safety of the robotic arm in complex environments, depth camera collision detection and grasping correction are introduced to optimize the robotic arm control and improve the performance, accuracy and adaptability of the robotic arm grasping scenarios.

Based on the introduction and innovation of key technologies, Orbbec has successfully overcome the difficulties of multimodal robotic arms in multiple cross-fields and opened up the "last mile" for the implementation of engineering applications.

In the field of robot vision, Orbbec has more than 8 years of industry experience and has served more than 100 robot industry companies. Through years of cooperation, Orbbec has accumulated rich experience in robot 3D sensors, lidar, models, etc., helping robot customers to quickly realize innovative application development and mass production.

Laying out a large multimodal visual model

Potential applications of dimensional robots

In what scenarios can a robotic arm that integrates multiple large model capabilities be used?

As the robot's "eyes" (visual sensors), "brain" (large model) and "body" (ontobody) continue to develop and evolve, intelligent robots and robotic arms are expected to be first implemented in scenarios such as manufacturing, flexible logistics, and commercial services.

For example, in an automated factory scenario, a robotic arm based on a multimodal large model can be combined with an unmanned vehicle to perform intelligent sorting and transportation; in a home service robot scenario, people can use simple natural language commands to let the robot help pour water or pick up express deliveries.

Currently, for the robotics industry, Orbbec can provide 3D vision sensors with full technical routes such as monocular structured light, binocular structured light, iToF, LiDAR, dToF, etc., and provide multi-sensor fusion support. At the same time, in response to the development trends of large models and embodied intelligent robots, Orbbec is committed to building a robot and AI vision middle platform. Through the research and development of multimodal visual large models and intelligent algorithms, combined with robot vision sensors, it will form a complete autonomous mobile positioning navigation and obstacle avoidance, providing a full-range capability platform and series solutions for downstream customers in the entire industry to welcome the era of intelligent robots.

Reviewing Editor: Peng Jing

Reference address:Orbbec helps robot customers quickly realize innovative application development and mass production

Previous article:AI is beginning to transform robots. What will the future look like?
Next article:Slamtec won the "China Robot Sensor Innovation Application Award"

Latest robot Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号