Article count:16428 Read by:87919360

Hottest Technical Articles
Exclusive: A senior executive of NetEase Games was taken away for investigation due to corruption
OPPO is going global, and moving forward
It is reported that Xiaohongshu is testing to directly direct traffic to personal WeChat; Luckin Coffee is reported to enter the US and hit Starbucks with $2, but the official declined to comment; It is reported that JD Pay will be connected to Taobao and Tmall丨E-commerce Morning News
Yu Kai of Horizon Robotics stands at the historical crossroads of China's intelligent driving
Lei Jun: Don't be superstitious about BBA, domestic brands are rising in an all-round way; Big V angrily criticized Porsche 4S store recall "sexy operation": brainless and illegal; Renault returns to China and is building a research and development team
A single sentence from an overseas blogger caused an overseas product to become scrapped instantly. This is a painful lesson. Amazon, Walmart, etc. began to implement a no-return and refund policy. A "civil war" broke out between Temu's semi-hosted and fully-hosted services.
Tmall 3C home appliances double 11 explosion: brands and platforms rush to
Shareholders reveal the inside story of Huayun Data fraud: thousands of official seals were forged, and more than 3 billion yuan was defrauded; Musk was exposed to want 14 mothers and children to live in a secret family estate; Yang Yuanqing said that Lenovo had difficulty recruiting employees when it went overseas in the early days
The app is coming! Robin Li will give a keynote speech on November 12, and the poster reveals a huge amount of information
It is said that Zhong Shanshan asked the packaged water department to sign a "military order" and the entire department would be dismissed if the performance did not meet the standard; Ren Zhengfei said that it is still impossible to say that Huawei has survived; Bilibili reported that employees manipulated the lottery丨Leifeng Morning News
Account Entry

In-depth | Facebook's first new deep learning mobile platform, how is the "video version" of Prisma implemented?

Latest update time:2017-01-04
    Reads:

Leifeng.com is recruiting!

Join Leifeng.com, share the information dividend of the AI ​​era, and walk with the intelligent future. I heard that all the great people have clicked here .


Leifeng.com: "Style effect transfer" has been well known since the emergence of Prisma. It uses neural representation to separate and reassemble the content and style of the image, and finally can be used to depict artistic images. However, as we have experienced, Prisma still has some problems, such as server overload caused by a large number of users and insufficient computing power for intelligent software processing.


When publishing " The Art of AI Photo Editing: The Fantastic Algorithm Behind Prisma | In-depth " , Leifeng.com editors asked some deep learning engineers why it was not applied to videos. The answer was that in addition to the problems with Prisma, it was also very troublesome to keep the frame rate consistent over time, which required higher technology.


Now, in order to let you feel like holding a Van Gogh brush when shooting videos with your mobile phone, Facebook has pioneered a new deep learning mobile platform, which uses the Caffe2go operating environment and style transfer algorithm model to complete the real-time extraction, analysis and processing of pixel features on the mobile phone. This article is compiled by Tupu Technology engineers from the Facebook website "delivering real-time AI in the palm of your hand", explaining how the "video version" of Prisma is achieved? Leifeng.com exclusive first article.


As video calls become the most popular way to communicate, we want to give everyone the most advanced and creative tools for self-expression. We recently began testing a new creative effects camera on Facebook that helps people instantly transform ordinary videos into beautiful works of video art.


This technology is called "style transfer". It extracts artistic style and features from one image, such as the style of a Van Gogh painting, and applies it to another image or video. This technology is usually difficult to implement, and in the past, the data had to be sent to a data center to be processed on a server with higher processing power. But now it is different! We have pioneered a new deep learning mobile platform that can extract, analyze and process pixel features in real time on mobile phones. The most advanced technology is now available in your hands.


This is a mature deep learning system called "Caffe2Go", and its framework has now been embedded in our mobile applications. By compressing the AI ​​model for processing images and videos 100 times, this system can efficiently run different types of neural networks on Apple and Android systems. From the current results, we have the ability to complete the AI ​​process on mobile phones, and the time it takes is even less than one twentieth of a second (actually 50 milliseconds) - a blink takes every one-third of a second, or 300 milliseconds.


(The picture shows a screenshot of the video shot by Facebook's creative special effects camera. The full video can be viewed by clicking the original text)


The style effect conversion tool mentioned above is actually a combination of two technologies: the Caffe2go runtime environment and the style conversion algorithm model. Because our AI team deals with algorithms and large systems, they are very capable of developing new models that are applicable to both technologies, allowing the style conversion function to achieve faster and higher-quality conversion. The combination of these two technologies allows you to have the wonderful feeling of holding a Van Gogh brush when shooting videos with your mobile phone.


Three months ago, we set out to do what no one else had done before: deliver this AI style transfer feature as a creative tool that runs in real time on all people’s devices. A group of talented people across product, technology, and science have joined the project. Justin Johnson, a member of Facebook’s AI research group, is the author of a foundational research paper that describes this technology, building on previous work in the field. Our Applied Machine Learning group has been working on building an AI engine that runs on mobile devices. The Creative Camera team understands user needs very well, and in collaboration, developed a solution that can run highly optimized neural networks in real time on mobile devices. In the following posts, we’ll explain how we thought about and developed this useful technology. Let’s start with Caffe2go.



Caffe2go


  • Lightweight and fast


Artificial intelligence has had a huge impact on computer science, but it is still limited by the processing of big data, which is sometimes far away from the AI ​​device. Therefore, all AI models that process in real time are still affected by the latency of connecting to the data processing center before running on the GPU. We think it is a bit impractical to have people running around supercomputers, so we want to find a way to make AI run on the CPU of one of the most ubiquitous devices today - smartphones.



有了Caffe2go后,不需连接至远程服务器,智能手机就能实现识别、表达以及理解。尽管如此,智能手机还是有所局限。尽管在最近几年智能手机在计算能力上有了显著的提高,已经能够在一秒钟内执行数十亿的算术计算;但是它仍然存在各种各样的资源局限,比如电量、内存以及专为智能软件设计的计算能力。因此, 智能手机对机器学习系统来说既是机遇,又是挑战。


Our solution to this challenge was to design an extremely lightweight, modular framework. To do this, we applied the Unix philosophy and built on top of the open source Caffe2 project. This ensured that the core framework for displaying and connecting various components was lightweight enough to connect multiple modules, including optimized designs for mobile phones. We maintained a sophisticated algorithmic framework that allows engineers to describe abstract computations as a directed acyclic graph (DAG), but this requires that the nodes in the input and output graphs are unconstrained to execute. This allows our engineering development team to execute and optimize modules on different platforms while easily connecting modules. When this image is actually running, it instantiates itself with a variety of hardware characteristics to achieve the highest speed.


Because speed is key for compute-intensive mobile applications, especially graphics and video, the lightweight framework design allows us to optimize custom operators for the platform. One notable example is a library called NNPack, which is integrated into our mobile runtime from Caffe2. By leveraging a mobile CPU feature called NEON, we are able to greatly increase the speed of mobile computing. On iOS devices, we have also begun to organize synthetic computing acceleration features such as "metalanguage". These are achieved through a modular design without changing the overall model definition. Therefore, algorithms and runtimes can safely support each other without worrying about any potential incompatibility risks.


  • "Developer-friendly" design


Caffe2 is also our first industrialized deep learning platform that can run at full speed on four server platforms: CPU, GPU, iOS and Android, using exactly the same code. Due to its modular design, the framework allows each platform to use the same language while optimizing for personalized needs. In fact, this is the execution detail hidden from developers. For example, the framework can choose NNPacak for mobile devices (IOS and Android) or CUDNN for GPU servers. Therefore, algorithm developers can focus on studying algorithms without having to distract themselves from studying convolution operations (a linear operation).


The rapid deployment of the design framework is very beneficial to developers. For developers, debugging on mobile phones can be a challenge because the toolchain set on mobile phones is not as advanced as desktop computers and servers. We deal with this problem by extracting the neural network mathematics from the hardware, and a serialized network in Caffe2go can be executed on a mobile phone or a server with the same numerical output. Therefore, we can move a large part of the work to the server environment (model training, performance testing, user experience research), and when everything is working properly, we can have a mobile environment with one-click deployment capabilities.


  • Training and testing of style transfer models


The idea of ​​"style transfer" actually existed a long time ago. It was first proposed by a research team in a seminal paper titled "A Neural Algorithm for Artistic Style" published in August 2015. However, the technology was slow to develop and required powerful servers to support it. In the following months, the research team improved and perfected the technology, increasing its running speed by several levels, but it still relied heavily on the computing power on the server.


Now we can run artificial intelligence quickly on mobile devices, but in order to ensure a high-quality, high-resolution real-time image style transfer experience, we still need to continue to optimize and improve the model.


  • Optimization of efficient model size


Traditional style transfer models (including feed-forward variants) have large parameters and slow transfer speeds. The design goal of our style transfer application is to create a new, lighter, and more efficient model that can output high-quality video at more than 20 frames per second without frame drops on an iPhone 6s.


We took three main approaches to model compression.


We optimized the size of the convolutional layers (the most time-consuming part of the processing) and the width of each layer, while adjusting the spatial resolution of the process. The number of convolutional layers and their width can be used as separate levers to adjust processing time, by adjusting the angle of the image being processed, or by adjusting the number of times a single processing action is performed. For spatial resolution, we can adjust the actual size of the objects in the intermediate processing layers. By using early convolution (scaling down the image being processed) and late deconvolution (scaling up the processed image), the system does not have to process as much information and processing speed is increased. And by using this technique, we can significantly reduce the width and depth of the network while maintaining a fairly good image quality.



  • Improve image quality


Image quality is subjective and difficult to measure accurately, especially for something like style transfer. Therefore, we built visualization tools including A/B testing to train and ensure that different models achieve the highest quality image results. Using a large-scale GPU cluster powered by FBLearner Flow technology, we can quickly scan a wide range of hyperparameters (such as model architecture, content style size, and downsampling) to find a "well-trained" feedforward style that can achieve the target performance while maintaining and improving image quality to complete the above tests.


Of course, there are many ways to improve image quality. For example, applying individual instance normalization instead of the usual batch normalization can help with many style effects, avoiding zero padding in the convolution layer, or applying different pre-processing and post-processing filters to the style or image can reduce image artifacts. However, in our testing, we found that these methods are effective for some styles, not all styles.


With the continuous optimization and improvement of style transfer technology in terms of speed and image quality, it is believed that it will be just around the corner to apply a real-time image processing system running on the Caffe2 framework to mobile devices.



What's next?


Caffe2go and Torch-like toolchain research are at the core of Facebook's machine learning products, and caffe2go stands out from Facebook's tool stack due to its size, speed, and flexibility.


We are very happy to share our software and designs with the industry community so that we can learn better ways to use multiple hardware platforms and algorithm designs, which are very important for cross-platform machine learning systems. In the coming months, we will continue to focus on the open source part of this AI framework.


As we move forward, you can imagine how real-time AI devices can help shape a more open and connected world for people in areas such as accessibility, education, and more. The smart devices in our hands will further change our understanding of AI. With fast and lightweight machine learning systems like Caffe2go, we will continue to work hard to provide you with more and better AI and augmented reality experiences, such as letting you feel like holding a Van Gogh brush while shooting a video.


Artificial intelligence was all the rage in 2016. While traditional companies were still figuring out how to go online, Internet companies had already begun thinking about how to go AI-based.


Leifeng.com has a copy of "Goldman Sachs: 2016 Artificial Intelligence Industry Research Report" here , I hope it can be inspiring to everyone. (Reply " 1230 " in the Leifeng.com public account conversation interface to get it)




Click on a keyword to view related historical articles


popular articles


Luo Zhenyu's judgment on artificial intelligence is not even correct nonsense

Analysis of Faye Wong’s VR Live Broadcasting Service

Inventory | Seven major machine learning open source projects from giants such as Google and Microsoft

Shared bikes are expanding wildly. How will the story after the “first year” be written?

Revealed: The true face of Didi’s artificial intelligence dispatching system

Zhang Xiaolong answered these ten questions about mini programs


Zuckerberg Development Notes

GoPro | How Spring Festival travel ticket swiping works | AI beauty

IoT Year-end Review | AI Medical Imaging Companies Review

Huawei 5G | Autopilot 2.0 | JD X Division

Commercial sex robots | Taobao Buy+ | Zhang Xiaolong's internal speech

Musk's solar tiles | Foreign journalists evaluate LeEco's ecosystem

Xiaomi Mi MIX | Xiaomi VR | Huawei Kirin 960

MacBook Pro Launch | Microsoft Launch 2016

Hammer M1/M1L | Loongson 3A3000 | Samsung Note 7

DJI Mavic | Google Home

Domestic multi-line laser radar | Google Daydream VR helmet

Xiaomi 5s | Movidius | lightning | Prisma | Live

Xiaomi Robot Vacuum Cleaner | Yi M1 Micro Single Camera | Xiaomi Notebook

App ID | Huawei drone | Amazon Echo



Latest articles about

Database "Suicide Squad" 
Exclusive: Yin Shiming takes over as President of Google Cloud China 
After more than 150 days in space, the US astronaut has become thin and has a cone-shaped face. NASA insists that she is safe and healthy; it is reported that the general manager of marketing of NetEase Games has resigned but has not lost contact; Yuanhang Automobile has reduced salaries and laid off employees, and delayed salary payments 
Exclusive: Google Cloud China's top executive Li Kongyuan may leave, former Microsoft executive Shen Bin is expected to take over 
Tiktok's daily transaction volume is growing very slowly, far behind Temu; Amazon employees exposed that they work overtime without compensation; Trump's tariff proposal may cause a surge in the prices of imported goods in the United States 
OpenAI's 7-year security veteran and Chinese executive officially announced his resignation and may return to China; Yan Shuicheng resigned as the president of Kunlun Wanwei Research Institute; ByteDance's self-developed video generation model is open for use丨AI Intelligence Bureau 
Seven Swordsmen 
A 39-year-old man died suddenly while working after working 41 hours of overtime in 8 days. The company involved: It is a labor dispatch company; NetEase Games executives were taken away for investigation due to corruption; ByteDance does not encourage employees to call each other "brother" or "sister" 
The competition pressure on Douyin products is getting bigger and bigger, and the original hot-selling routines are no longer effective; scalpers are frantically making money across borders, and Pop Mart has become the code for wealth; Chinese has become the highest-paid foreign language in Mexico丨Overseas Morning News 
ByteDance has launched internal testing of Doubao, officially entering the field of AI video generation; Trump's return may be beneficial to the development of AI; Taobao upgrades its AI product "Business Manager" to help Double Eleven丨AI Intelligence Bureau 

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号