It's time to give the big model a body.
Recently, the R&D team of Orbbec has combined the arm with a large model, using the speech, language, and vision-language large model, supplemented by data input from the bbec Gemini 2 series depth camera, to create a robotic arm that can understand and perform voice tasks.
The project is based on the body created by Professor Fei-Fei Li's team at Stanford University. By solving a series of engineering problems such as generalization, observation, and control, the project will bring the robotic arm based on the multimodal large model from the environment to the real world, expanding the application potential of the intelligent robotic arm.
Integrate multiple large model capabilities
Let the robot arm understand and execute voice commands
Since last year, the emergence of various large models has triggered a new wave of development in the robotics industry. Although "large models + robots" are still in the early stages of technical exploration, with the in-depth integration of the two, robots are expected to have a smarter "brain", with more powerful "eyes" and "body", and achieve the evolution towards embodied intelligence.
The large-model robotic arm created by Orbbec can use voice prompts as input, and utilize the understanding and visual perception capabilities of multiple large models to generate spatial semantics, allowing the robotic arm to understand and execute actions.
First, the robot arm can use the voice big model to recognize the voice commands of the task issuer; at the same time, it can obtain high-quality environmental RGB and Depth data through two Orbbec Gemini 2 binocular structured light cameras; then use the SAM, CLIP and other visual-language big models to understand the scene information, conduct real-time collisions, and finally execute the task.
Based on this principle, Orbbec can enable the robotic arm to complete a series of instructions, such as:
Please remember the current status
Put the red cube into the yellow box
Put the green square into the white box
Rotate the blue block 30° counterclockwise
Move the blue block 10cm towards the green block
Put the blue block on top of the green block
Please restore the original state
Please put all the blocks into the yellow box
At present, the project has established a baseline for the application and deployment of multi-modal large models on robotic arms in the 1.0 stage. Orbbec is further optimizing multi-modal command understanding, multi-fusion perception, robotic arm trajectory planning and control, and end-grasping control. In the future, it will launch large-model robots to make them more intelligent and flexible and adapt to more complex operation scenarios.
Overcoming the generalization, observation, and control challenges
From simulation to reality
At present, many studies on robot agents at home and abroad are mostly completed in simulation environments. From virtual simulation to the real world, a series of engineering implementation problems need to be overcome. For example, in a simulation environment, the camera is based on an ideal imaging model and is not affected by imaging distortion, ambient lighting, etc. This poses a challenge to the generalization ability of the agent in real scenes.
Based on the pre-trained multi-modal robotic arm model, the R&D team of Orbbec overcame a series of implementation difficulties such as generalization, observation, and control:
In order to achieve fast and accurate voice input and understanding, a large voice pre-training model is introduced to enable the robotic arm to respond sensitively to voice commands.
To ensure that the robotic arm has sufficient generalization capabilities in the real world, a large vision-language model is used to enable the robotic arm to understand and adapt to complex scenarios and perform tasks robustly in diverse environments.
In order to deal with the ideal camera imaging problem of the pre-trained model, a new calibration scheme is designed to optimize the camera automatic exposure (AE) strategy to solve the challenges brought by factors such as ambient light, imaging distortion, and perspective deformation, making the robotic arm more robust.
In order to improve the safety of the robotic arm in complex environments, depth camera collision detection and grasping correction are introduced to optimize the robotic arm control and improve the performance, accuracy and adaptability of the robotic arm grasping scenarios.
Based on the introduction and innovation of key technologies, Orbbec has successfully overcome the difficulties of multimodal robotic arms in multiple cross-fields and opened up the "last mile" for the implementation of engineering applications.
In the field of robot vision, Orbbec has more than 8 years of industry experience and has served more than 100 robot industry companies. Through years of cooperation, Orbbec has accumulated rich experience in robot 3D sensors, lidar, models, etc., helping robot customers to quickly realize innovative application development and mass production.
Laying out a large multimodal visual model
Potential applications of dimensional robots
In what scenarios can a robotic arm that integrates multiple large model capabilities be used?
As the robot's "eyes" (visual sensors), "brain" (large model) and "body" (ontobody) continue to develop and evolve, intelligent robots and robotic arms are expected to be first implemented in scenarios such as manufacturing, flexible logistics, and commercial services.
For example, in an automated factory scenario, a robotic arm based on a multimodal large model can be combined with an unmanned vehicle to perform intelligent sorting and transportation; in a home service robot scenario, people can use simple natural language commands to let the robot help pour water or pick up express deliveries.
Currently, for the robotics industry, Orbbec can provide 3D vision sensors with full technical routes such as monocular structured light, binocular structured light, iToF, LiDAR, dToF, etc., and provide multi-sensor fusion support. At the same time, in response to the development trends of large models and embodied intelligent robots, Orbbec is committed to building a robot and AI vision middle platform. Through the research and development of multimodal visual large models and intelligent algorithms, combined with robot vision sensors, it will form a complete autonomous mobile positioning navigation and obstacle avoidance, providing a full-range capability platform and series solutions for downstream customers in the entire industry to welcome the era of intelligent robots.
Reviewing Editor: Peng Jing
Previous article:AI is beginning to transform robots. What will the future look like?
Next article:Slamtec won the "China Robot Sensor Innovation Application Award"
- Popular Resources
- Popular amplifiers
- China's robot density has surpassed Germany and Japan
- Molex releases new report on the future of robotics and explores the huge potential of human-robot collaboration
- my country is developing a six-legged moon landing robot: no worries even if the legs are broken
- Using IMU to enhance robot positioning: a fundamental technology for accurate navigation
- Researchers develop self-learning robot that can clean washbasins like humans
- Universal Robots launches UR AI Accelerator to inject new AI power into collaborative robots
- The first batch of national standards for embodied intelligence of humanoid robots were released: divided into 4 levels according to limb movement, upper limb operation, etc.
- New chapter in payload: Universal Robots’ new generation UR20 and UR30 have upgraded performance
- Humanoid robots drive the demand for frameless torque motors, and manufacturers are actively deploying
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- Vicor high-performance power modules enable the development of low-altitude avionics and EVTOL
- Chuangshi Technology's first appearance at electronica 2024: accelerating the overseas expansion of domestic distributors
- Chuangshi Technology's first appearance at electronica 2024: accelerating the overseas expansion of domestic distributors
- "Cross-chip" quantum entanglement helps build more powerful quantum computing capabilities
- Ultrasound patch can continuously and noninvasively monitor blood pressure
- Ultrasound patch can continuously and noninvasively monitor blood pressure
- Europe's three largest chip giants re-examine their supply chains
- Europe's three largest chip giants re-examine their supply chains
- Breaking through the intelligent competition, Changan Automobile opens the "God's perspective"
- The world's first fully digital chassis, looking forward to the debut of the U7 PHEV and EV versions
- [Evaluation of domestic FPGA Gaoyun GW1N-4 series development board]——4. Use of embedded logic analyzer
- Causes of PCB copper shedding
- Texas Instruments introduces industry's fastest 16-bit SAR ADC with up to 4MSPS
- In 2020, 5G has covered 1,336 cities around the world, with a growth rate of more than three times
- Understanding the Internals of ADCs
- 【micropython】The default frequency of ESP32 firmware has been changed to 160M
- Introduction to TI's GaN-based power solutions and reference designs
- Three-phase rectifier control
- Reverse switch single phase motor forward and reverse wiring diagram
- EEWORLD University ---- The most important components of the analog world - signal chain and power supply: DC/DC switching regulator