To achieve its goal, Google AI transformed its body into this...

Latest update time：2018-10-15

Reads：

Strengthening chestnut from Aofei Temple
Produced by Quantum Bit | Public Account QbitAI

It is no longer uncommon for reinforcement learning AI to play games.

The intelligent agent dies and lives in the virtual world, and gradually understands what kind of strategy can make it live longer and get more rewards .

But the AI may not know that, and the problem of playing the game poorly may be that there is a problem with the body structure of the intelligent body .

△ Today’s protagonist may be the rubber fruit in the AI world

If we can learn strategies and improve our body shape at the same time , we may be able to achieve a greater reinforcement learning AI.

Therefore, David Ha from Google Brain developed a special training plan for his AI with a two-pronged approach :

The agent continuously adjusts its body shape, such as the length of its legs, to find the structure that best suits the current task; it also conducts strategy training at the same time.

△ Before body training (left) vs. after body training (right): the speed is obviously different

You see, the agent has made its legs thinner and its speed much faster.

In addition, it can also develop off-road capabilities.

During the rugged journey, the original-sized intelligent bodies often overturned.

△ Before the transformation, rollovers were common

But after developing an elegant body shape, rollovers almost disappeared , and the strategy training time was reduced to 30% of the original .

Once you have a scientific figure, the strategy will be easy to learn.

So, what kind of graceful figure can reduce time cost and improve performance at the same time? You will know after watching it for a while.

What’s the secret to being beautiful and intelligent?

In the past, the shape and structure of intelligent agents were mostly fixed, and they only focused on policy training. However, the shape pre-set by the system is usually not the most ideal structure (for a specific task).

Therefore, as mentioned above, strategies must be learned, and body optimization must also be learned together.

In this case, it is not enough to train only with the weight parameters of the policy network , the environment must also be parameterized .

Structural features of the body, such as thigh or calf length, width, mass, orientation, and so on, are all part of this environment .

The weight parameter w here combines the policy network parameters and the environment parameter vector to develop both body and skills.

As the weight w is continuously updated, the intelligent agent becomes stronger and stronger.

Is body modification useful? Just compete with an intelligent agent that only learns strategies and does not change its structure. If the reward points are improved, it means that the AI has found a body shape that is more suitable for this environment.

Note that in order to cultivate AI's adventurous spirit, researchers have increased the rewards for difficult actions to guide the agent to challenge itself.

Body transformation, very good results

The competition venue is divided into two parts: one is Roboschool, a robot simulation library based on the Bullet physics engine, and the other is OpenAI Gym, based on the Box2D physics engine .

Both types of environments are parameterized, and AI can learn to adjust the parameters in them.

Unlock high score poses

First, we come to the football field (RoboschoolAnt-v1). The agent Ant here is a four-legged monster, each leg is divided into three parts and controlled by two joints. The legs are left to the AI to adjust, and the spherical body is not adjustable.

△ Three-section leg, the innermost section is less obvious

The task is simple, run as far as possible.

After training (right in the above picture), the most obvious change of the agent is that the legs have become thinner and longer , and the four legs are of different lengths, breaking the symmetry. After the body shape changed, the cadence also increased a lot, and the long-legged monster crossed the brown track earlier.

Take a look at the bonus points: in 100 tests, the original structure scored 3447 ± 251, while the new structure scored 5789 ± 479, which is a significant therapeutic effect.

△ The left is the original, and the right is after body training (the red line represents the laser radar)

Then, enter the green space scene (BipedalWalker-v2, based on Box2D, belonging to Gym). The agent here is bipedal and moves forward under the guidance of the "LiDAR".

The task is to cross a peaceful terrain within a specified time (this is the simple version, see below for the complex version full of obstacles). In terms of points, if 100 rollouts exceed 300 points , the task is considered successful.

The original body scored 347 points, while the optimized body scored 359 points.

Both tasks were successful, but the modified intelligent body not only had thinner legs, but also had its legs and four sections changed in length , giving the AI a new posture for bouncing forward . The action looked easier, and the score was higher than before.

Good figure can accelerate strategy learning

The hardcore version of the green area above (BipedalWalkerHardcore-v2) is here: the road is rugged, with thousands of mountains and valleys, and you will fall into the abyss if you are not careful.

David Ha wants to prove here that a strong body can bring benefits to the strategic learning of intelligent agents , rather than just a crude combination of "learning two subjects at the same time".

Unlike the previous overall thin legs , this time the hind legs of the intelligent body evolved thick calves , and the length is similar to the width of the groove.

△ The red line represents the laser radar

In this way, when crossing the chasm, the hind legs can build a bridge to protect the intelligent body from passing smoothly and not overturning .

At the same time, the front legs take on the responsibility of " danger detectors ", detecting what kind of obstacles are ahead, and as an aid to the "laser radar", they can provide a basis for the next action of the hind legs.

The point is that during the creation of this new body, the AI has learned the strategy to pass the level in just 12 hours . In comparison, the original training method without body optimization took 40 hours (feedforward strategy network, 96 GPUs).

That is to say, elegant structures accelerate the learning process of the agent.

△ Add body optimization (orange), training efficiency is significantly improved, and the goal is achieved in about 1,000 generations

Brain holes don’t just fall from the sky

First, how could David Ha foresee that improving the structure of the intelligent agent could improve training efficiency ?

He said he was inspired by nature .

△ Wrong demonstration

Some animals can still jump and swim after brain death .

That is to say, many behaviors of organisms do not depend on the brain.

There is a theory called Embodied Cognition that holds that many features of cognition are not determined by the brain alone: all aspects of the organism, such as the motor system, the sensory system, the interaction between the organism and the environment, etc., will have an impact on cognition.

For example, during long-term training, athletes not only get physical exercise, but also develop certain specific psychological qualities.

David Ha believes that this phenomenon may also occur in AI: training the body and thus affecting cognition.

Second, the idea of changing the structure of intelligent agents through training also comes from nature.

△ Flamingos are not originally red, but their feathers turn red after eating small fish and shrimps

High school biology teaches us that phenotype is the result of the interaction between genotype and environment .

Then, various virtual scenes will also allow the intelligent structure that is more adaptable to the environment to stand out. In this way, AI can use the environment to choose and develop more sophisticated skills.

Fate is wonderful beyond words.

Paper portal:
https://designrl.github.io/

-over-

Join the community

The QuantumBit AI community has started recruiting. Students who are interested in AI are welcome to reply to the keyword "communication group" in the dialogue interface of the QuantumBit public account (QbitAI) to obtain the way to join the group;

In addition, qubit professional sub-groups ( autonomous driving, CV, NLP, machine learning , etc.) are recruiting for engineers and researchers working in related fields.

To join the professional group, please reply to the keyword "professional group" in the dialogue interface of the Quantum Bit public account (QbitAI) to obtain the entry method. (The professional group has strict review, please understand)

Sincere recruitment

Qbit is recruiting editors/reporters, and the work location is Beijing Zhongguancun. We look forward to talented and enthusiastic students to join us! For relevant details, please reply to the word "recruitment" in the dialogue interface of the Qbit public account (QbitAI).

Quantum Bit QbitAI · Toutiao signed author

Tracking new trends in AI technology and products

Latest articles about

■AI venom is all over Douyin and Xiaohongshu! Xianyu generates it for 10 yuan per time, but the official website is actually free

■The space-based intelligent version of ImageNet is here! Produced by Fei-Fei Li and Jia-Jun Wu’s team

■Multimodal models can be connected to the Internet without fine-tuning. A plug-and-play new framework is more effective than closed-source commercial solutions.

■Last week! 2024 Artificial Intelligence Annual Selection, the industry pioneers in the AI era are waiting for you

■The world's first legal o1 big model is released, slow thinking legal experts under the System2 paradigm | HKUST & Peking University

■Tsinghua University and Xiamen University proposed the "infinite length context" technology, which can find a needle in a million haystacks and make Llama\Qwen\MiniCPM score high

■Domestic AI can now shoot micro-movies! 4K, 60fps high-definition picture quality, with built-in sound effects

■Ant Group’s front-end technology team shares: What opportunities and changes will front-end development usher in under the wave of AI?

■AI protein published in Nature again after winning the Nobel Prize, with first-principles-level accuracy, a 4-year effort by Microsoft Research Asia

■A pop-up window confused Claude, and he suddenly couldn't use the computer | Stanford & HKU new research