In-depth | Facebook's first new deep learning mobile platform, how is the "video version" of Prisma implemented?

Latest update time：2017-01-04

Reads：

Leifeng.com is recruiting!

Join Leifeng.com, share the information dividend of the AI era, and walk with the intelligent future. I heard that all the great people have clicked here .

Leifeng.com: "Style effect transfer" has been well known since the emergence of Prisma. It uses neural representation to separate and reassemble the content and style of the image, and finally can be used to depict artistic images. However, as we have experienced, Prisma still has some problems, such as server overload caused by a large number of users and insufficient computing power for intelligent software processing.

When publishing " The Art of AI Photo Editing: The Fantastic Algorithm Behind Prisma | In-depth " , Leifeng.com editors asked some deep learning engineers why it was not applied to videos. The answer was that in addition to the problems with Prisma, it was also very troublesome to keep the frame rate consistent over time, which required higher technology.

Now, in order to let you feel like holding a Van Gogh brush when shooting videos with your mobile phone, Facebook has pioneered a new deep learning mobile platform, which uses the Caffe2go operating environment and style transfer algorithm model to complete the real-time extraction, analysis and processing of pixel features on the mobile phone. This article is compiled by Tupu Technology engineers from the Facebook website "delivering real-time AI in the palm of your hand", explaining how the "video version" of Prisma is achieved? Leifeng.com exclusive first article.

As video calls become the most popular way to communicate, we want to give everyone the most advanced and creative tools for self-expression. We recently began testing a new creative effects camera on Facebook that helps people instantly transform ordinary videos into beautiful works of video art.

This technology is called "style transfer". It extracts artistic style and features from one image, such as the style of a Van Gogh painting, and applies it to another image or video. This technology is usually difficult to implement, and in the past, the data had to be sent to a data center to be processed on a server with higher processing power. But now it is different! We have pioneered a new deep learning mobile platform that can extract, analyze and process pixel features in real time on mobile phones. The most advanced technology is now available in your hands.

This is a mature deep learning system called "Caffe2Go", and its framework has now been embedded in our mobile applications. By compressing the AI model for processing images and videos 100 times, this system can efficiently run different types of neural networks on Apple and Android systems. From the current results, we have the ability to complete the AI process on mobile phones, and the time it takes is even less than one twentieth of a second (actually 50 milliseconds) - a blink takes every one-third of a second, or 300 milliseconds.

(The picture shows a screenshot of the video shot by Facebook's creative special effects camera. The full video can be viewed by clicking the original text)

The style effect conversion tool mentioned above is actually a combination of two technologies: the Caffe2go runtime environment and the style conversion algorithm model. Because our AI team deals with algorithms and large systems, they are very capable of developing new models that are applicable to both technologies, allowing the style conversion function to achieve faster and higher-quality conversion. The combination of these two technologies allows you to have the wonderful feeling of holding a Van Gogh brush when shooting videos with your mobile phone.

Three months ago, we set out to do what no one else had done before: deliver this AI style transfer feature as a creative tool that runs in real time on all people’s devices. A group of talented people across product, technology, and science have joined the project. Justin Johnson, a member of Facebook’s AI research group, is the author of a foundational research paper that describes this technology, building on previous work in the field. Our Applied Machine Learning group has been working on building an AI engine that runs on mobile devices. The Creative Camera team understands user needs very well, and in collaboration, developed a solution that can run highly optimized neural networks in real time on mobile devices. In the following posts, we’ll explain how we thought about and developed this useful technology. Let’s start with Caffe2go.

● ● ●

Caffe2go

Lightweight and fast

Artificial intelligence has had a huge impact on computer science, but it is still limited by the processing of big data, which is sometimes far away from the AI device. Therefore, all AI models that process in real time are still affected by the latency of connecting to the data processing center before running on the GPU. We think it is a bit impractical to have people running around supercomputers, so we want to find a way to make AI run on the CPU of one of the most ubiquitous devices today - smartphones.

有了Caffe2go后，不需连接至远程服务器，智能手机就能实现识别、表达以及理解。尽管如此，智能手机还是有所局限。尽管在最近几年智能手机在计算能力上有了显著的提高，已经能够在一秒钟内执行数十亿的算术计算；但是它仍然存在各种各样的资源局限，比如电量、内存以及专为智能软件设计的计算能力。因此， 智能手机对机器学习系统来说既是机遇，又是挑战。

Our solution to this challenge was to design an extremely lightweight, modular framework. To do this, we applied the Unix philosophy and built on top of the open source Caffe2 project. This ensured that the core framework for displaying and connecting various components was lightweight enough to connect multiple modules, including optimized designs for mobile phones. We maintained a sophisticated algorithmic framework that allows engineers to describe abstract computations as a directed acyclic graph (DAG), but this requires that the nodes in the input and output graphs are unconstrained to execute. This allows our engineering development team to execute and optimize modules on different platforms while easily connecting modules. When this image is actually running, it instantiates itself with a variety of hardware characteristics to achieve the highest speed.

Because speed is key for compute-intensive mobile applications, especially graphics and video, the lightweight framework design allows us to optimize custom operators for the platform. One notable example is a library called NNPack, which is integrated into our mobile runtime from Caffe2. By leveraging a mobile CPU feature called NEON, we are able to greatly increase the speed of mobile computing. On iOS devices, we have also begun to organize synthetic computing acceleration features such as "metalanguage". These are achieved through a modular design without changing the overall model definition. Therefore, algorithms and runtimes can safely support each other without worrying about any potential incompatibility risks.

"Developer-friendly" design

Caffe2 is also our first industrialized deep learning platform that can run at full speed on four server platforms: CPU, GPU, iOS and Android, using exactly the same code. Due to its modular design, the framework allows each platform to use the same language while optimizing for personalized needs. In fact, this is the execution detail hidden from developers. For example, the framework can choose NNPacak for mobile devices (IOS and Android) or CUDNN for GPU servers. Therefore, algorithm developers can focus on studying algorithms without having to distract themselves from studying convolution operations (a linear operation).

The rapid deployment of the design framework is very beneficial to developers. For developers, debugging on mobile phones can be a challenge because the toolchain set on mobile phones is not as advanced as desktop computers and servers. We deal with this problem by extracting the neural network mathematics from the hardware, and a serialized network in Caffe2go can be executed on a mobile phone or a server with the same numerical output. Therefore, we can move a large part of the work to the server environment (model training, performance testing, user experience research), and when everything is working properly, we can have a mobile environment with one-click deployment capabilities.

Training and testing of style transfer models

The idea of "style transfer" actually existed a long time ago. It was first proposed by a research team in a seminal paper titled "A Neural Algorithm for Artistic Style" published in August 2015. However, the technology was slow to develop and required powerful servers to support it. In the following months, the research team improved and perfected the technology, increasing its running speed by several levels, but it still relied heavily on the computing power on the server.

Now we can run artificial intelligence quickly on mobile devices, but in order to ensure a high-quality, high-resolution real-time image style transfer experience, we still need to continue to optimize and improve the model.

Optimization of efficient model size

Traditional style transfer models (including feed-forward variants) have large parameters and slow transfer speeds. The design goal of our style transfer application is to create a new, lighter, and more efficient model that can output high-quality video at more than 20 frames per second without frame drops on an iPhone 6s.

We took three main approaches to model compression.

We optimized the size of the convolutional layers (the most time-consuming part of the processing) and the width of each layer, while adjusting the spatial resolution of the process. The number of convolutional layers and their width can be used as separate levers to adjust processing time, by adjusting the angle of the image being processed, or by adjusting the number of times a single processing action is performed. For spatial resolution, we can adjust the actual size of the objects in the intermediate processing layers. By using early convolution (scaling down the image being processed) and late deconvolution (scaling up the processed image), the system does not have to process as much information and processing speed is increased. And by using this technique, we can significantly reduce the width and depth of the network while maintaining a fairly good image quality.

Improve image quality

Image quality is subjective and difficult to measure accurately, especially for something like style transfer. Therefore, we built visualization tools including A/B testing to train and ensure that different models achieve the highest quality image results. Using a large-scale GPU cluster powered by FBLearner Flow technology, we can quickly scan a wide range of hyperparameters (such as model architecture, content style size, and downsampling) to find a "well-trained" feedforward style that can achieve the target performance while maintaining and improving image quality to complete the above tests.

Of course, there are many ways to improve image quality. For example, applying individual instance normalization instead of the usual batch normalization can help with many style effects, avoiding zero padding in the convolution layer, or applying different pre-processing and post-processing filters to the style or image can reduce image artifacts. However, in our testing, we found that these methods are effective for some styles, not all styles.

With the continuous optimization and improvement of style transfer technology in terms of speed and image quality, it is believed that it will be just around the corner to apply a real-time image processing system running on the Caffe2 framework to mobile devices.

● ● ●

What's next?

Caffe2go and Torch-like toolchain research are at the core of Facebook's machine learning products, and caffe2go stands out from Facebook's tool stack due to its size, speed, and flexibility.

We are very happy to share our software and designs with the industry community so that we can learn better ways to use multiple hardware platforms and algorithm designs, which are very important for cross-platform machine learning systems. In the coming months, we will continue to focus on the open source part of this AI framework.

As we move forward, you can imagine how real-time AI devices can help shape a more open and connected world for people in areas such as accessibility, education, and more. The smart devices in our hands will further change our understanding of AI. With fast and lightweight machine learning systems like Caffe2go, we will continue to work hard to provide you with more and better AI and augmented reality experiences, such as letting you feel like holding a Van Gogh brush while shooting a video.

Artificial intelligence was all the rage in 2016. While traditional companies were still figuring out how to go online, Internet companies had already begun thinking about how to go AI-based.

Leifeng.com has a copy of "Goldman Sachs: 2016 Artificial Intelligence Industry Research Report" here , I hope it can be inspiring to everyone. (Reply " 1230 " in the Leifeng.com public account conversation interface to get it)