DeepMind proposes a new neural network architecture to extract key points from videos using an unsupervised method | Paper
Tong Ling from Aofei Temple
Produced by Quantum Bit | Public Account QbitAI
Extracting key points has previously been seen as a task that requires a lot of data, but a recent study by DeepMind disagrees.
DeepMind’s new model, Transporter, learns abstract object-centric representations from raw video frames and can generate control policies and exploration programs using simple algorithms.
In other words, using unsupervised methods and very little data, key points can be extracted and effective control can be performed without rewards.
The effect is as follows:
Software engineer @AwokeKnowing said that DeepMind also rigorously discussed the limitations of the research at the end, but this research in an unsupervised environment without hard-engineered features is indeed a groundbreaking research .
New Transporter Architecture
In the paper Unsupervised Learning of Object Keypoints for Perception and Control, researchers proposed a new neural network architecture called Transporter that can learn the state of object keypoints across a variety of commonly used reinforcement learning environments.
The architecture of Transporter is as follows:
The researchers said in the paper that the model transforms an original video frame (xt) into another target frame (xt') by exploiting the movement of objects to discover key points.
This learning process is divided into three stages.
During training, the spatial feature maps Φ(xt) and Φ(xt') and the key point coordinates Ψ(xt) and Ψ(xt') are used to predict frames using convolutional neural networks and the PointNet previously proposed by Stanford. In the process, the coordinates of the key points are converted into Gaussian heatmaps HΨ(xt) and HΨ(xt').
During transport, the network performs two operations:
First, the features of the original frame are set to 0 in HΨ(xt) and HΨ(xt'), and second, the feature position HΨ(xt') in the source target image is replaced by HΨ(xt).
In the final stage of improvement, the researchers completed two more tasks: drawing the missing features at the original location and cleaning up the image near the target location.
The researchers visualized these extracted key points and compared them with the previous state-of-the-art key point extraction method by T. Jakab, Y. Zhang et al.:
T. Jakab等人研究: Unsupervised learning of object landmarks through conditional image generation.
Address: http://sina.lt/guuH
Y. Zhang等人研究:Unsupervised discovery of object landmarks as structural representations
Address: https://arxiv.org/abs/1804.04412
The researchers found that Transporter learned more spatially aligned keypoints and was robust to objects of varying numbers, sizes, and motions.
Using learned keypoints as state inputs, we achieve better policies than state-of-the-art reinforcement learning methods on several Atari environments, but with only 100k environment interactions.
DeepMind Team
The research comes from Tejas Kulkarni, Ankush Gupta, Catalin Ionescu, Sebastian Borgeaud, Malcolm Reynolds, Andrew Zisserman and Volodymyr Mnih of DeepMind.
First author Tejas Kulkarni is currently a senior research scientist at DeepMind. He previously pursued a PhD at MIT, focusing on visual motion, deep reinforcement learning agents, and language of intelligent agents.
Many papers have been included in top conferences such as CVPR 17, NIPS 17, and ICML 18.
Portal
Unsupervised Learning of Object Keypoints for Perception and Control
https://arxiv.org/abs/1906.11883
https://twitter.com/deepmindai/status/1145677732115898368?s=21
-over-
Featured Posts