Qualcomm CVPR Research: Video processing computation reduced by 78%, teaching the convolutional layer to “select pixels” by itself, making videos that are stuck in PPT smoother

Latest update time：2021-09-03 14:33

Reads：

Xiaoxiao sent from Aofei Temple
Quantum Bit Report | Public Account QbitAI

In the field of imaging, there is no limit to the capabilities of AI algorithm experts.

Now, with the rapid development of the video industry, related algorithms are also becoming a new trend in computer vision research.

After all, in daily life, whether it is video calls or online class live broadcasts, a large number of video processing algorithms are involved .

However, if the performance of these algorithms is not high, the video will become stuck and the resolution will be reduced, resulting in a very poor experience.

(Imagine a video call where the screen is stuck in a PowerPoint presentation. I’m getting angry…)

Therefore, reducing the computational complexity of video algorithms has always been a research topic for AI vision algorithm experts at home and abroad.

Recently, two CVPR 2021 papers have attracted a lot of attention in the video circle.

They teach the algorithm model to "save computing power" on its own, increasing the computing efficiency of the video processing algorithm by more than several times without reducing performance!

Teach AI to save computing power by itself, computing power -78%

Processing videos with convolutional neural networks is actually a computationally intensive task.

The "amount of computation" here does not refer to the size of the video, but the way convolution processes images - "scanning" the image completely.

But in real videos, there are often a lot of scenes with little change (even only one hand moves in 10 frames) :

In this case, if each pixel is processed again... it seems that the GPU is burning.

So, is it possible to teach AI to be “lazy” efficiently and not waste any extra computing power?

Of course you can, and there are 2 ways.

In the first paper, a new convolutional layer called Skip-Convolutions was proposed , which can subtract the previous and next frames of images and only convolve the changed parts.

Yes, just like the human eye, it is easier to notice the “moving parts”.

Very quickly, the amount of calculation dropped from 10.2GMACS (10^9 fixed-point multiplication and accumulation calculations per second) to 0.4GMACS, less than 4% of the original amount !

Note that this convolutional layer is applicable to any neural network algorithm, including optical flow, semantic segmentation, classification tasks, etc., not just the above pose estimation.

In the latest semantic segmentation task, compared with the classic video AI algorithm HRNet , this algorithm reduced the amount of computation by 78% and the latency by 65% without reducing performance.

The second paper uses a new method to allow the AI model to "control the amount of computing by itself."

The paper proposes a network called FrameExit , which consists of multiple cascade classifiers and can change the number of neurons used in the model according to the complexity of the video frame.

When the difference between the previous and next frames of the video is large, AI will use the entire model for calculation; when the difference between the previous and next frames is small, it will only use part of the model for calculation.

In other words, if a frame does not seem to require complex calculations, it is sufficient to use a smaller model.

Compared with other models, this method can improve performance by up to 5 times .

At the same time, the accuracy (mAP) of the neural network detection not only did not decrease, but even increased!

Currently, the second paper has been selected for the oral of CVPR 2021.

What’s important is that the company behind these two papers is Qualcomm , a company that is closely related to all mobile phone users.

It looks like we'll have access to more powerful mobile video apps.

Mobile video application, super double performance

Qualcomm is already researching and implementing these two AI video perception technologies.

It has to be said that even the direction of implementation is also the rigid demand of our daily mobile video applications.

In addition to optimizing video processing algorithms, this type of perception technology also allows more AI video models to be used on mobile phones.

The first is the optimization of the video processing algorithm .

For example, for common video call scenarios such as online video conferencing and online classes, if the video processing algorithm model is not good, the quality of real-time calls will be very poor.

It may even cause stuttering and frame drops, which is worse than the experience of a voice call.

But if this type of video perception technology is used, AI can intelligently process some pixels in the video, greatly reducing the amount of image calculations required for video calls and making the call process smoother.

For example, when our mobile phones perform intelligent editing on video files , they often consume a lot of power and load files slowly.

However, if this type of algorithm is used to process video editing applications, it can not only optimize the algorithm itself, but also make the editing process smoother.

In fact, it is precisely because of this type of video perception algorithm that more AI models can be applied to mobile phones.

Take Xiaomi 11 as an example. One of its video editing functions is to pause part of the video and keep playing the other part, just like one person casts a "time stop" magic on another person.

This type of video algorithm model previously required a lot of computation and originally required GPUs to implement in the paper. Now, you can use your mobile phone to achieve "time stopping" in real time:

Not only a video, but even specific frames in it can be paused and made into a very interesting video:

For example, the image enhancement algorithms commonly seen in major AI vision papers were previously mainly implemented for photo taking and cannot be applied to videos.

But now, due to the reduction in video computing power, it can be used in real-time video shooting, even including scenarios such as video conferencing.

Taking the night scene photography of OPPO Find X3 Pro as an example, the backlight or night scene video effect under normal shooting can also clearly see the face under the calculation of AI:

Even the common video intelligent stabilization and video interpolation can be applied to mobile phone videos only because of the support of video perception algorithms for technologies such as intelligent inter-frame comparison and super-resolution algorithms.

For example, this is the video intelligent stabilization effect of vivo X60 Pro+ :

In fact, the above AI black technologies that have been applied to mobile phones are all supported by the computing power and processing performance of Snapdragon 888 .

In other words, Qualcomm has turned many AI video processing algorithms from "a few sheets of paper" into actual mobile video applications.

There are actually a lot of "invisible" AI black technologies around us

Not only mobile phone applications are constantly "advancing" with the help of these algorithms.

Behind the gradual realization of "future" scenarios such as smart healthcare, smart factories, and XR, there are also countless AI black technologies.

Taking our common VR devices as an example, thanks to the addition of AI algorithms, the camera can also achieve more accurate tracking from the inside to the outside.

After combining 5G for video transmission, VR devices powered by AI can not only provide scientific education to children, but also allow doctors to explain the condition of patients in more detail.

For example, now when you go to the hospital to see a doctor, you only need a code to collect relevant medical information including medical records, treatment progress, and the latest treatment results.

After scanning with the "Xiaoma Ge" developed by Dongda Integrated , doctors can quickly obtain all the information and make a diagnosis in time.

At the same time, IoT medical devices and AI data analysis can simplify health monitoring and build a truly "connected" hospital, allowing patients to view medical results in a timely manner in different regions and times.

For example, by using AI + edge computing + 5G, we can create an intelligent digital production line that replaces the human eye for quality inspection and defect identification, allowing factories to save a lot of labor costs.

In addition, industrial handling robots can also use 5G+AI to intelligently analyze the video stream data collected by the camera in the cloud or on the edge, thereby achieving remote control.

But users don't need to understand every detail.

Because cutting-edge technology companies like Qualcomm are overcoming these technical difficulties one by one.

△ Qualcomm's application layout in the direction of AI

Then, it is presented in the form of products so that every user can enjoy the latest technological breakthroughs without distinction.

How complicated is black technology?

That's not something most users need to consider.

Two CVPR 2021 paper addresses:
[1] https://arxiv.org/abs/2104.11487
[2] https://arxiv.org/abs/2104.13400

-over-

This article is the original content of [Quantum位], a signed account of NetEase News•NetEase's special content incentive plan. Any unauthorized reproduction is prohibited without the account's authorization.

click here