He Kaiming is on the CVPR 2021 award list, and 4 of the "best" papers are Chinese authors

Latest update time：2021-06-22

Reads：

Xiaocha from Aofei Temple
Quantum Bit Report | Public Account QbitAI

CVPR 2021 was officially held this week. As the most important academic conference in the field of computer vision, the best papers of the conference are naturally an important indicator that scholars in the field pay attention to.

Just last week, CVPR officially announced the list of 32 best paper candidates, of which 16 were Chinese first authors, including Peking University, Tencent, SenseTime and other domestic schools and institutions.

So which papers won the honors? Early this morning, on the first day of the conference, the official results were announced:

Among them, 1 paper won the Best Paper Award and the Best Student Paper Award, 3 papers were nominated for the Best Student Paper, and 2 papers were nominated for the Best Paper.

Among these 7 articles, 4 are written by Chinese authors, and we also saw the name of the familiar master He Kaiming .

7 award-winning papers

Best Paper Award

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

This paper comes from two scholars from the University of Tübingen in Germany.

Summary:

This paper proposes that incorporating composite 3D scene representations into generative models leads to more controllable image synthesis. Representing scenes as composite generative neural feature fields enables us to disentangle one or more objects from their background as well as the shape and appearance of individual objects, while learning from unstructured and unlocalized image collections without any additional supervision.

We combine this scene representation with a neural rendering pipeline to produce a fast and realistic image synthesis model that is able to disentangle individual objects and allow them to be translated and rotated in the scene and to change the camera perspective.

Paper address:
https://arxiv.org/abs/2011.12100

Source code:
https://github.com/autonomousvision/giraffe

Best Paper Nomination

This year, He Kaiming was nominated for the best paper. The paper is:

Exploring Simple Siamese Representation Learning

Summary:

In this paper, the authors show that Simple Siamese networks can learn meaningful representations even without using any of the following: (i) negative sample pairs, (ii) large batches, and (iii) momentum encoders.

The experiments show that collapsing solutions do exist in terms of loss and structure, but that the stopping gradient operation plays a crucial role in preventing collapse. The authors give a hypothesis about the meaning of stopping gradient and further show a proof-of-concept experiment that validates it.

The "SimSiam" method achieves competitive results on ImageNet and downstream tasks. The authors hope that this simple baseline will inspire people to rethink the role of the SimSiam architecture in unsupervised representation learning.

In addition, He Kaiming said that the paper code will be provided soon.

The first author of this article is Xinlei Chen, who graduated from Zhejiang University with a bachelor's degree and then received his Ph.D. from Carnegie Mellon University. He currently works at Facebook AI Research Institute like Kaiming He.

Paper address:
https://arxiv.org/abs/2011.10566

Source code: https://github.com/facebookresearch/simsiam

Another paper nominated was by two scholars from the University of Minnesota.

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Summary:

A key challenge in learning wearable body geometry is the limited availability of ground truth data, which leads to degraded performance of 3D body reconstruction when applied to real-world images.

This paper addresses this challenge by leveraging a new data source: social media dance videos that span different appearances, clothing styles, performances, and identities. Each video depicts the dynamic motion of a person's body and clothing while lacking 3D ground truth geometry.

To exploit these videos, the authors propose a novel method that uses local transformations to warp the local geometry of a person prediction from one image to the local geometry of another image. This method is end-to-end trainable, resulting in high-fidelity depth estimates that predict fine geometry faithful to the input real image. Experiments show that this method outperforms state-of-the-art human depth estimation and human shape recovery methods on both real and rendered images.

Paper address:
https://arxiv.org/abs/2103.03319

Best Student Paper Award

Task Programming: Learning Data Efficient Behavior Representations

The authors are from Caltech and Northwestern University.

Summary:

To conduct in-depth analysis with domain expertise, accurately annotated training sets are usually required, but obtaining these from domain experts is tedious and time-consuming. This problem is particularly prominent in automatic behavior analysis.

To reduce the annotation workload, this paper proposes TREBA: an efficient trajectory embedding method for annotation-behavior analysis samples based on multi-task self-supervised learning. The tasks in this method can be efficiently engineered by domain experts through a process of "task programming". By exchanging data annotation time for constructing a small number of programming tasks, the total workload of domain experts can be reduced.

This paper presents experimental results on three datasets in two domains and shows that the method reduces the annotation burden by up to 10 times without compromising the accuracy compared to SOTA methods.

It is worth mentioning that the first author of the paper, Jennifer J. Sun, is currently studying at the California Institute of Technology and received her undergraduate degree from the University of Toronto with a GPA of 4.0.

Paper address:
https://arxiv.org/abs/2011.13917

Source code:
https://github.com/neuroethology/TREBA

Best Student Paper Nomination

Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling

The paper is from the University of North Carolina at Chapel Hill.

This paper mainly studies the video question answering (VQA) problem.

The authors propose a general framework, ClipBERT, that enables affordable end-to-end learning for video and language tasks by using sparse sampling, using only sparsely sampled clips from one or a few videos in each training step.

Paper address:
https://arxiv.org/abs/2102.06183

Source code:
https://github.com/jayleicn/ClipBERT

Binary TTC: A Temporal Geofence for Autonomous Navigation

The paper is from NVIDIA and UC Santa Barbara.

The problem studied in this paper is relevant to autonomous driving technology, namely, time to contact (TTC), which is the time when an object collides with the observer plane, is a powerful tool for path planning, which may provide more information than the depth, velocity and acceleration of objects in the scene.

TTC has several advantages, including requiring only a single monocular, uncalibrated camera. However, regressing TTC for each pixel is not straightforward, and most existing methods make overly simplified assumptions about the scene. This paper addresses this challenge by estimating TTC via a series of simpler binary classifications. This is the first method that can provide TTC information at a sufficiently high frame rate.

Paper address:
https://arxiv.org/abs/2101.04777

Real-Time High-Resolution Background Matting

The paper is from the University of Washington.

This paper proposes a method for real-time high-resolution background replacement of videos, capable of running at 30fps at 4K resolution.

The main challenge is to compute high-quality alpha matte that preserves hair-level details while processing high-resolution images in real-time. To achieve this, the authors use two neural networks; a base network computes a low-resolution result that is refined by a second network that operates at high resolution on selective patches.

This method produces higher quality results than previous methods while achieving significant improvements in speed and resolution.

The project code has received 3.7k stars on GitHub.

Paper address:
https://arxiv.org/abs/2012.07810

Source code:
https://github.com/PeterL1n/BackgroundMattingV2

PAMITC Award

In addition to the best paper awards, this year's conference also presented PAMITC awards, including the Longuet-Higgins Award, the Young Investigator Award, and the inaugural Thomas Huang Memorial Award.

The two papers that won the Longuet-Higgins Award are:

《Real-time human pose recognition in parts from single depth image》

Baby talk: Understanding and generating simple image descriptions

The recipients of the Young Investigator Award are Georgia Gkioxari from FAIR and Phillip Isola from MIT.

Last year , Huang Xutao, a pioneer in the field of computer vision , passed away. In his memory, CVPR decided to award the Thomas Huang Memorial Award starting this year.

The winner of the first Thomas Huang Memorial Award is MIT computer professor Antonio Torralba, who had four papers selected for CVPR this year.

Introduction to this CVPR

Due to the impact of the COVID-19 pandemic, this year's CVPR will still be held online in the form of a virtual conference.

This year, CVPR received 7,039 valid submissions and 1,661 papers were accepted.

At the time of CVPR, major technology companies also released their own report cards one after another. Google published more than 70 papers and Facebook published 52 papers.

In recent years, domestic technology companies have published papers on CVPR that are on par with foreign giants, such as SenseTime , which published 66 papers, Huawei Noah's Ark Lab, which published 30 papers, Megvii, which published 22 papers, Tencent Youtu , which published 20 papers, and Kuaishou, which published 14 papers.

Of course, the workshops and turotials related to this year's CVPR are still ongoing. Interested readers can visit the link below to continue to follow~

Reference links:
http://cvpr2021.thecvf.com/node/141
http://cvpr2021.thecvf.com/node/329

-over-

This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.

"Smart Car" Exchange Group is recruiting

Friends who are interested in the AI industry, smart cars, and autonomous driving are welcome to join the community. Don’t miss out on smart cars. Industry development & technological progress :

click here