Humanoid AI hide-and-seek game amazes netizens: it is self-taught and can fly over eaves and walls, has rich expressions and can cooperate, Yao class top student Wu Yi participates
Yuyang Annie from Aofei Temple
Produced by Quantum Bit | Public Account QbitAI
The blue man tries hard to hide, while the red man tries hard to find it in the complex terrain. This is a life-or-death confrontation, not a CG animation, but:
OpenAI's agents are literally playing hide-and-seek.
This is a serious research, the purpose of which is to let AI learn to cooperate and compete. And the rules of the game are not given in advance, and the AI has to explore it on its own.
Because the demonstration effect looked so excellent, netizens even began to doubt the identity of OpenAI.
A Twitter user said:
In reality, OpenAI is an animation company.
Some netizens expressed amazement:
Oh my goodness, the production quality, the backgrounds, even the facial expressions of the agents are adorable. Is this a scientific paper or a new AI attraction at Disney's Epcot theme park?
In this large-scale AI hide-and-seek game research, which has been open sourced , the scene is cool and each agent has its own ideas:
Teams collaborate with each other and jointly fight against external forces...
Are these intelligent entities alive?
How does AI play hide-and-seek?
In this hide-and-seek game, the little red man is a "ghost" with his own small radar that can chase ghosts all over the field.
△ The red ghost can also push open the box
The little blue man’s task is simple: run.
Compared to ghosts with built-in sensors, their skill is to use objects like boxes to create obstacles and lock them.
At first, the AIs had no idea what they could do and just ran and chased out of "instinct".
But after 25 million games, the little blue man learned to protect himself from being discovered by moving boxes and building shelters.
After another 75 million games, the red devils will use the ramp to break into the shelter !
After suffering another 10 million losses, the little blue people built a shelter again and took the ramp with them.
What’s even more amazing is that AIs are not only capable of fighting alone, they have also learned to work in teams.
Look at the level of collaboration of the little blue man. It is really smooth, seamless and strategic:
What, you think the terrain is too simple? After nearly 500 million trainings, AIs unlocked a more complex version :
This group of AIs is really amazing.
The Secret of Hide and Seek
Once again, the above is not CG, not CG, not CG .
This is a new study from OpenAI. Through multi-agent competition, a simple goal like hide-and-seek, and standard reinforcement learning algorithms, researchers found that AIs created a self-supervised autocurricula without learning the rules in advance.
This includes multiple rounds of different emergency strategies as well as complex tool use and team coordination.
A curriculum can be thought of as a series of challenges. An autocurricula is one where each challenge is generated by the system itself. (This concept was proposed by DeepMind, see the end of the paper for the link)
△ AIs discovered up to six unique strategies during training
Moreover, as the environment becomes more complex, the AIs become more powerful. Compared with other self-supervised reinforcement learning methods, this group of AIs behaves more like humans.
OpenAI calls this goalless exploration .
The AIs did not learn the rules of hide-and-seek in advance, but developed freely based on their understanding of the game world.
To achieve this, the researchers exploited entity-centric observation and adopted an attention mechanism to capture object-level information.
In a given environment, each agent acts independently based on its own observations and hidden memory states.
Each object is embedded and passed through an occluded residual self attention block, where attention is focused on the object rather than changing over time.
Objects that are in front of the agent but not within its line of sight will be blocked so that the agent cannot capture their information.
The agent’s policy is trained via self-play and proximal policy optimization. During optimization, the agent can use privileged information about occluded objects and other agents in its value function.
此外,大规模训练效果显著。 增加批量大小可以大大加快收敛速度,并且不会对样本效率产生太大的影响。
Is the AI trained in this way robust or not ?
Instead of just talking, why not take an IQ test to practice?
OpenAI researchers designed a set of questions covering five benchmark intelligence tests, divided into two main areas: cognition and memory.
-
Object counting task: Tests whether the agent has a persistent perception of objects (i.e., understands that things exist even when they cannot be perceived).
-
Lock and Return: Tests whether the agent can remember its original position when performing a new task.
-
Sequential locking task: 4 boxes are placed in a specified order in 3 random rooms without doors, but each room has a ramp. The agent must lock all the boxes in a specific order without any observation.
-
Blueprint construction task: The agent must place the boxes in the room into the four specified target locations.
-
Shelter construction task: The agent must build a shelter with boxes around a cylinder.
The results show that AI pre-trained in the hide-and-seek environment outperforms the baseline method in three tasks: lock and return, sequential lock, and blueprint construction, but performs worse in object technology and housing construction.
△ The blue line is the AI pre-trained in the hide-and-seek environment
Not limited to game scenes
This research attracted a lot of attention from netizens, and in less than a day, the tweet had received over 2,000 likes.
A software engineer lamented that this research directly hits the difficulty of the agent alignment problem. Even in a simple task like hide-and-seek, AI may have unexpected behaviors.
Netizens on HackerNews said that what’s even cooler is that this research can later be used for real-world robots, allowing them to learn to overcome challenges.
OpenAI itself also came out and stated that this research has driven research in four directions.
First, we demonstrate that multi-agent automatic curriculum is one of the causes of many different and mixed phase transitions in agent decision making.
Second, it was verified that when guiding the behavior of intelligent agents in a simulated real physical environment, multi-agent automatic courses can train human-like skills, such as using tools to achieve goals.
Third, this study also proposed a framework for evaluating intelligent agents in an open environment, as well as a set of targeted intelligent agent intelligence tests, which will be of certain reference significance for future research on intelligent agents.
Finally, this study also open-sourced the environment and the code for building the environment, which will encourage further research on multi-agent automated courses based on physical environments.
Simple rules, multi-agent competition, and standard large-scale reinforcement learning algorithms can motivate agents to learn complex strategies and skills in an unsupervised manner.
Looking forward, the significance of the research is not limited to the theoretical research stage or the game scene, but will cover all aspects of daily life.
Foreign media VentureBeat quoted DeepMind Hassabis’s views on game AI in its report:
Game AI is a stepping stone to general AI. The real reason we study these games is that it's a very convenient testing ground for developing algorithms.
We are developing an algorithm that can be translated into the real world to solve truly challenging problems and help experts in these fields.
Whether it is DeepMind or OpenAI, using games to train technology that can be applied in real scenarios is also creating a small world.
Yao Class graduates participated
The paper is from Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew of OpenAI and Igor Mordatch of Google Brain.
The first author, Bowen Baker, received his bachelor’s and master’s degrees in electrical engineering and computer science. He has been working at OpenAI since December 2017 as a research scientist, focusing on multi-agent research.
Another young and promising Chinese researcher in the author team is Yi Wu, a graduate of Tsinghua University's Yao Class in 2010 and a student of Professor Yao Qizhi, another Chinese Turing Award winner.
As a member of the Yao Class, which is known as "Tsinghua University houses half of the country's talents, and Tsinghua University houses half of its talents", Wu Yi worked at major Internet companies such as Microsoft, Facebook, and Toutiao during his undergraduate studies and gained rich internship experience.
From 2014 to 2019, Wu Yi went to the University of California, Berkeley to study artificial intelligence, with his main research directions being deep reinforcement learning, natural language processing, and probabilistic programming.
Wu Yi has published more than ten papers at various top AI conferences. His research has appeared in conferences such as IJCAI 16, AAAI 17, EMNLP 17, ICML 18, and NIPS 18. This year, Wu Yi also participated in the research of two AAAI 19 Oral papers.
Wu Yi has also made his mark in various competitions and is the ACM/ICPC North American champion, the silver medalist in the world finals, and the silver medalist in IOI2010.
The website of the Institute of Cross-Disciplinary Information Sciences of Tsinghua University and Wu Yi's personal resume show that Wu Yi will join the company next year. The 28-year-old promising academic will serve as an assistant professor at the Institute of Cross-Disciplinary Information Sciences of Tsinghua University.
Coming from the Yao class and returning to the Yao class is not only a harvest season, but also a story of passing on the torch.
Wu Yi's resume:
https://jxwuyi.weebly.com/contest-and-interest.html
Portal
Finally, let's enjoy the full video of the animation company OpenAI.
Blog:
https://openai.com/blog/emergent-tool-use/
Code:
https://github.com/openai/multi-agent-emergence-environments
HackerNews:
https://news.ycombinator.com/item?id=20996771
VentureBeat report:
https://venturebeat.com/2019/09/17/openai-and-deepmind-teach-ai-to-work-as-a-team-by-playing-hide-and-seek/
论文Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research:
https://arxiv.org/pdf/1903.00742.pdf
The author is a contracted author of NetEase News and NetEase "Each has its own attitude"
-over-
List collection! Three major awards, targeting AI top players
AI Community | Communicate with outstanding people
Quantum Bit QbitAI · Toutiao signed author
Tracking new trends in AI technology and products
If you like it, click "Like"!