Tsinghua University uses 6 wireless sensors to capture full-body motion, allowing users to run, jump and roll

Latest update time：2021-06-07

Reads：

Mengchen sent this from Aofei Temple
Quantum Bit Report | Public Account QbitAI

A research team from Tsinghua University released a video:

The precise and smooth movements of the two players playing basketball were captured by the laptop in the lower right corner .

But there were no cameras in the room, and the two men didn’t seem to be wearing any equipment?

The equipment and venue layout for full-body 3D motion capture in my impression is like this:

△ Optical motion capture, shooting scene of "The Last of Us"

Or something like this:

△ Inertial motion capture

The lightest one weighs 5-6 catties, and the cheaper ones are even heavier, up to 10 kg. Wearing it affects the flexibility of movement, and you will get tired quickly, so it is basically unsuitable for daily use.

The equipment in the video is marked later. It can be seen that each of them only wears 6 small inertial sensors, which are wireless .

VR devices currently on the market mainly use optical motion capture.

You have to know that when VR first came out, regardless of whether it was wired or wireless, the biggest obstacle was to place 3 to 6 pillars around the room.

Later it was simplified to a camera on the head-mounted device scanning the surrounding environment to achieve positioning, plus inertial sensors on the two handles, such as PSVR.

However, the scope of motion capture is limited to the head and hands, and leg movements have always been a problem. Movement can only be forward in the picture, and the specific leg movements cannot be shown in the game.

△ One of the ways to move in the VR game "Half-Life: Alyx"

Or the fitness ring can simply detect that you are lifting your legs and then simulate the fixed leg animation in the game, or borrow the transportation tools in the game.

Now, Tsinghua University's new method can capture jumping and squatting in real time, with a frame rate of 90fps :

Jumping over obstacles or even rolling over is no problem:

In addition to tracking the whole body movements of the human body, it can also achieve spatial positioning. Since no fixed sensors are required, long-distance movement is no problem.

Compared with optical motion capture, inertial motion capture has two advantages. One is that it is not afraid of being blocked by environmental obstacles.

The second is that there is no requirement for the lighting environment and it can be used at night.

In addition to personal VR games, new inertial motion capture technology may also reduce the cost of commercial motion capture, allowing small-scale production teams to use it.

In games and animated films, motion capture studios look like this:

I'm afraid only large companies can afford this.

In addition to entertainment, motion capture technology is also used in the medical field , where data can be used to guide the injured to better conduct rehabilitation training .

Bidirectional Recurrent Neural Network

How did they achieve this? It turns out that they rely on deep learning.

The research team divided motion capture into three subtasks in stages. First, the positions of the head and limbs of the five main nodes were calculated from the inertial data, then refined into the positions of all 23 nodes, and finally solved through inverse dynamics (IK) .

Since predicting continuous actions depends not only on the previous calculation results, but also on the results of the next layer, the bidirectional recurrent neural network (biRNN) is used in this step .

The spatial positioning problem is also divided into two parts. One is the probability distribution of the foot contacting the ground, and then combined with the speed of the root node, the speed in the world coordinates is calculated, which also uses RNN and biRNN.

Different public datasets are used for training for different tasks, containing posture and spatial position parameters of 300 subjects for more than 40 hours.

Compared with previous studies, the task decomposition method helps to obtain higher frame rates with fewer resources and is capable of capturing high-speed motion .

And it realizes spatial positioning while capturing motion.

However, there are still two shortcomings. One is that the effect of motion capture depends on the training data set, and the effect is generally poor for actions that are not in the training set.

Also, when calculating the probability distribution of contact between the foot and the ground, it is assumed that the foot is fixed when in contact, which is not applicable to sports such as skateboarding.

Author Team

The paper of this project has been accepted by SIGGRAPH 2021, the top conference in computer graphics.

The research team is from Tsinghua University's Beijing National Research Center for Information Science and Technology and School of Software.

Associate Professor Xu Feng’s team, first author Yi Xinyu.

Project address:
https://xinyu-yi.github.io/TransPose/

Paper address:
https://arxiv.org/abs/2105.04605

-over-

This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.

Free registration | NVIDIA CV Open Class

On June 17, NVIDIA experts will demonstrate an example of "quickly building a gesture recognition system" to help everyone learn how to build, train, and deploy AI models with low barriers and high efficiency.

ps After registration, you can join the group to get a series of CV courses Live playback , PPT , source code Oh~