Tencent's robot dog appears on the cover of Nature's sub-journal: It is as agile as a real dog and can play orienteering
Cressy from Aofei Temple
Quantum Bit | Public Account QbitAI
Tencent's robot dog appeared on the cover of Nature's sub-journal!
Under its control, the robot dog's movements become more and more like those of dogs in the real world.
Pay attention, the two robot dogs here are playing "orienteering", the kind that involves chasing.
In the game, two robot dogs play the roles of pursuer and escaper respectively, and the escaper needs to reach the designated location without being caught.
Once it reaches the designated location, the two robot dogs will switch identities, and this cycle will continue until one is caught.
One of the difficulties of this game is that there is a maximum speed limit. Neither of the two robot dogs can rely on speed alone to win, and they must plan a certain strategy.
There are even more difficult obstacle courses, with more intense battles and more exciting scenes.
This new control framework is used behind this robot cross-country competition.
The framework adopts a hierarchical strategy and uses a generative model to learn the movement patterns of animals , with training data coming from a Labrador retriever.
This method allows the robot dog to no longer rely on physical models or manually designed reward functions, and can understand and adapt to more environments and tasks like animals.
Exercise like a real dog
This robot dog is named MAX, weighs 14kg, and has three actuators on each leg, which can provide an average continuous torque of 22N·m and a maximum of 30N·m.
One of the highlights of MAX is that it can imitate dogs in the real world.
In the indoor environment, MAX broke free from the researchers and began running freely.
If you put MAX outside, it can also run and play happily on the grass.
This imitation becomes even more lifelike when encountering complex terrain with obstacles.
Upward, MAX can climb stairs swiftly and agilely.
Downward, it was also able to drill through obstacles, and the horizontal bar in front of it was not touched at all.
Behind this series of actions is the strategy that MAX's control system learned from the actions of a Labrador.
By imitating real dogs, MAX can also plan more advanced strategies and complete more complex tasks. The chase battle shown above is a good example.
It is worth mentioning that in addition to letting the two robot dogs compete with each other, the researchers also joined the battle through handle control.
It is not difficult to see from the picture that the robot dog in the human-controlled mode (No. 1 in the picture below) is not as flexible as the pure machine solution (No. 2) .
The final result was that even with cheats (the maximum speed limit of the robot dog controlled by humans was higher), humans still lost to the machines with a score of 0:2.
In addition to enabling the robot dog to move flexibly, the framework's greatest advantage is its versatility, allowing for pre-training and knowledge reuse for different task scenarios and robot forms.
In the future, the team also plans to migrate the system to scenarios of humanoid robots and multi-agent collaboration.
So how did the researchers at Robotics X come up with this solution?
Adding a hierarchical framework for generative models
The core idea of the researchers in designing this control framework is to imitate the movement, perception and strategy of real animals .
The framework enables robots to understand and adapt to environments and tasks from a broader perspective, just like animals, by building pre-trained, reusable, and extensible primitive-level, environment-level, and policy-level knowledge.
In terms of specific implementation, the framework also adopts a hierarchical control method, in which the three levels - primitive motion controller (PMC) , environmental adaptation controller (EPMC) and strategy controller (SEPMC) - correspond to the primitive level, environmental level and strategy level knowledge respectively.
First, humans issue a high-level instruction (such as telling the machine the rules and goals of a racing game) , which is the only part of the process that requires human involvement.
This high-level instruction will be received by SEPMC, which will formulate a strategy based on the current situation (such as the robot's role, the opponent's position, etc.) , and then generate a navigation command including information such as the moving direction and speed.
The navigation command is then passed to the EPMC, which then combines environmental perception information (such as terrain height map, depth information, etc.) to select an appropriate motion mode, form a category distribution, and select a suitable discrete potential representation.
Finally, PMC combines this potential representation with the robot's current state (such as joint position, speed, etc.) to obtain the motor control signal and finally deliver it for execution.
The training order is exactly the opposite - starting from PMC and ending with SEPMC.
The first stage of PMC training, or primitive training, is to build basic athletic ability .
The training data for this phase came from motion capture of a well-trained medium-sized Labrador retriever.
By guiding the dogs to complete various actions, the authors collected about half an hour of motion sequences of different gaits (such as walking, running, jumping, sitting, etc.), sampled at a frequency of 120 frames per second.
The dogs followed different paths during the capture process, such as straight lines, squares, and circles. In addition, the authors also collected about 9 minutes of movement data of going up and down stairs.
To bridge the differences in skeletal structure between animals and robots, the authors used inverse kinematics methods to redirect the dog’s joint motion data to the robot’s joints.
Through further manual adjustments, reference motion data compatible with the quadruped robot were finally obtained.
△ Data map, does not represent the source of training data
Based on these data, the authors used a generative model VQ-VAE encoder to compress and represent the animal's movement patterns and construct a discrete latent space of PMC.
Through vector quantization technology, these continuous latent representations are discretized into predefined discrete embedding vectors, and the decoder generates specific motion control signals based on the selected discrete embedding and the current robot state.
Based on VQ-VAE, the training goal of PMC is to minimize the deviation between the generated motion trajectory and the reference trajectory.
At the same time, the authors introduced a priority sampling mechanism to dynamically adjust the weights of different motion modes in training according to their difficulty, ensuring that the network can fit all reference data well.
Through continuous iteration and optimization, PMC gradually learns a set of discrete representations that can effectively express complex animal movements until convergence.
The results of the PMC stage provide the basis for EPMC to generate higher-level motion control information .
EPMC introduces an environmental perception module based on PMC , which receives information from vision, radar and other sensors, so that the policy network can dynamically adjust the movement mode according to the current environmental status.
The core of EPMC is a probability generation network that generates a probability distribution on the discrete latent space provided by PMC according to the current perception information and command signals.
This distribution determines which primitive motor patterns should be activated to best fit the current environment and task.
The training of EPMC is achieved by minimizing the loss functions of environment adaptation and task completion, gradually learning to optimize the motion strategy and improve the robot's adaptability and robustness.
The final SEPMC training phase further enhances the robot’s cognitive and planning capabilities , enabling it to formulate and execute high-level strategies in a multi-agent interactive environment.
Based on EPMC, SEPMC generates high-level strategic decisions (such as chasing and avoiding) according to the current game state (such as the position of itself and the opponent) and historical interaction records .
The pursuit orienteering game played by the MAX robot is also the training method of SEPMC.
In this stage, the author adopted the advanced multi-agent reinforcement learning algorithm PFSP to continuously improve the robot's strategy level through self-game.
During the training process, the current strategy constantly competes with historically strong opponents, forcing it to learn more robust and efficient strategies.
Thanks to the solid foundation laid in the first two stages, the learning of this complex strategy is very efficient and can converge quickly even in the case of sparse rewards.
It is worth mentioning that in such a multi-agent solution, some agents that simulate humans can be introduced to achieve collaboration between machines or between humans and machines.
The above training process is completed in a simulated environment and then migrated to the real environment with zero samples.
In the simulation, physical parameters can be freely controlled. The authors randomized a large number of physical parameters (including load, terrain changes, etc.) . The strategy obtained through reinforcement learning must be able to cope with these changes and obtain stable and general control capabilities.
In addition, the author used LSTM in each layer of the control framework, so that each level has a certain temporal memory and planning ability.
In terms of sensors, the authors have mainly verified that a series of complex tasks can be completed using the Motion Capture system, or visual perception based solely on the Depth Camera.
In order to handle more open and complex environments, the author will further integrate LiDAR, Audio and other sensory inputs in the future to conduct multimodal understanding and better respond to the environment.
Paper address:
https://www.nature.com/articles/s42256-024-00861-3
Project homepage:
https://tencent-roboticsx.github.io/lifelike-agility-and-play/
-over-
QuantumBit's annual AI theme planning Now soliciting!
Welcome to submit your contributions to the special topic 1,001 AI applications , 365 AI implementation solutions
Or share with us the AI products you are looking for or the new AI trends you have discovered
Click here ???? Follow me, remember to mark the star~