Regarding the application and practice of AI, Tencent AI researchers made the following reflections

Latest update time：2019-03-29

Reads：

▲Click above Leifeng.com Follow

Summary of the technical salon "Intelligently Transforming the Future - A Brief Discussion on the Application and Practice of Artificial Intelligence Technology".

Article | Hwang Suncheong

Leifeng.com AI Technology Review: A technology salon event with the theme of "Intelligently Transforming the Future - A Brief Discussion on the Application and Practice of Artificial Intelligence Technology" was held in Beijing on March 23, hosted by Tencent Youtu and co-organized by Tencent Cloud, Tencent Ai Lab and Geekbang. At the salon, five guests from Tencent and Intel shared their views on AI topics such as technology, products, practice and application.

At the beginning of the event, Zhou Kejing, product manager of Tencent Youtu, first gave us a sharing on "The Practice and Application of Computer Vision Technology in Smart Retail".

In recent years, as the growth rate of online sales has slowed down, online shopping competition has officially entered the stock stage. The introduction of smart retail in 2016 further illustrates that people's consumption structure is changing and they are more focused on offline real experience. At the same time, the rapid advancement of technology has effectively reduced the cost of implementing smart retail.

Smart retail is an online-offline linkage with people at the core. Based on the scene dataization and data networking, it realizes panoramic data insights, thereby improving business operation efficiency. In this process, computer vision mainly plays the key role in connecting people, goods and places, from visiting stores to shopping to purchasing, completing AI empowerment that knows people, faces and hearts.

Zhou Kejing briefly introduced the functions of computer vision at different stages of offline operations and the technologies involved:

Passing by the store - Entering the store

Purpose: Operation, anti-theft

Technologies involved: face detection, face attribute analysis, large-scale face retrieval

Shop

Purpose: Fine-grained passenger flow statistics, accurate positioning of customer attributes, and trajectory hotspots

Technologies involved: Human head tracking technology solution, human body ReID technology solution

Cashier

Involved technologies: face recognition + liveness detection

The second speaker was Wang Chuannan, senior researcher of AI application research at Tencent Youtu, whose topic was “From Hardware to Algorithm - Tencent Youtu AI Terminal Product Practice”.

As computer vision technology matures, it has spawned more and more computer vision + hardware requirements, and has been widely used in various industries. This article introduces the evolution of liveness detection technology in detail: from the earliest digital voice (lip movement + voice) to the later action interaction anti-copying, and in 2017, Youtu launched the first light liveness technology, which can verify the three-dimensional shape and texture of a human face by emitting random light signals from the screen while collecting images. Even the latest 3D liveness detection technology is widely used.

Even the most effective 3D detection solution still encounters many difficulties when it is implemented, especially the need to adapt to various complex lighting environments and ensure the clarity of the facial area. There are corresponding requirements in terms of ISP, resolution, frame rate, depth accuracy and working distance, which require us to work together to overcome and solve them.

In addition, in order to make the software better adapt to the hardware, we must optimize the performance of the algorithm terminal. To this end, Tencent Youtu has developed a high-performance mobile forward computing framework NCNN and a deep learning inference framework RapidNet, both of which are independently developed by Tencent Youtu, and the former has been open sourced.

NCNN is a high-performance neural network forward computing framework that is extremely optimized for mobile phones. Its main advantages are:

• Supports convolutional neural networks, supports multi-output and multi-branch structures, and can calculate partial branches

• ARM NEON assembly-level optimization, extremely fast calculation speed

• Sophisticated memory management and data structure design, extremely low memory usage

• Support multi-core parallel computing acceleration, ARM BIG.LITTLE CPU scheduling optimization

• Scalable model design, supports 8-bit quantization and half-precision floating point storage, can import Caffe models

RapidNet is a deep learning inference framework that has many outstanding advantages such as cross-platform, high performance, model compression, and code tailoring. It provides unified interface calls and synchronous optimization strategies on various platforms. In the face of heterogeneous networks, RapidNet can effectively utilize hardware acceleration technology and ensure task scheduling of multi-core CPU/GPU. As for the difficulty of quantification, RapidNet can ensure that the model effects of gesture detection, tracking, etc. are improved by 20%-40% on most models, while the average accuracy reduction is within 0.5%.

Afterwards, Jin Mingjie, a senior researcher from Tencent AI Lab, shared with us "Applications and Practices of Voice Technology Based on AI Lab".

Speech is the sound of a person. If a machine wants to understand a person's voice, it usually uses an audio signal. An audio signal is an information carrier of a regular frequency and amplitude change of a sound wave. The core elements of concern are sampling rate, quantization bit number, and encoding algorithm. Common speech technology presentation forms can be divided into two types: one is information from speech to target, and the other is information from given information to speech. The technologies involved mainly include voice wake-up, voiceprint recognition, speech recognition, voice activity detection, and speech synthesis.

Take the speaker product as an example. The speaker receives the signal of people speaking through the front end, and then after voice wake-up and various front-end technologies are processed, the speaker transmits the signal to the cloud, which then performs voiceprint recognition and speech recognition. After being recognized as text, it then turns to semantic understanding, performs intent recognition through text processing, and then calls the function module, allowing users to listen to songs, get weather forecasts, listen to audiobooks, etc., and finally respond to terms.

Voice wake-up mainly looks at three indicators: FA (false wake-up), FR (failure to wake up) and EER (the state where FA equals FR). In terms of the specific operation process, the first thing is to determine the modeling unit, and then train it through a trained neural network training model. In order to ensure the wake-up effect, we need to ensure that the model can only be awakened when the speaking content meets the conditions of time continuity and speaking order. As for how to set it up, it belongs to the product experience level. The common structures of voice wake-up can be mainly divided into two types: single wake-up model and dual wake-up model-the former has a simple structure, but the model is complex and the power consumption is high, and some small chips may not be able to withstand it; the latter has a complex structure and low power consumption, and some wake-up models can be placed in the cloud to avoid false wake-up.

As for the front-end technology, the microphone array is mainly used to achieve the following effects:

• Speech enhancement/dereverberation

• Sound source localization

• Echo Cancellation

Speech recognition technology converts the vocabulary content in human speech into computer-readable input, that is, sending the audio signal to the cloud, which acts as a decoder to recognize the result.

The part of the decoder that is responsible for converting the audio signal into a modeling unit is the acoustic model. The more common ones are:

• DNN network - input layer at the bottom, N hidden layers in the middle, and output layer at the top. It has relatively small computational requirements and is very easy to deploy, and can be run on almost any device.

• CLDNN network - C stands for convolutional network, L stands for LSTM network, and D stands for DNN. The advantage of this network is that it converges quickly and can quickly achieve better recognition results.

Finally, Jin Mingjie also gave us an outlook on the development of voice products. The parts that need to be improved include:

• Dialects, Mandarin

• Multilingual mix

• Voice Changer

• Multiple people talking

At the end of the event, Zhou Jicheng, senior product manager from Tencent Cloud Big Data and Artificial Intelligence Product Center, shared on “Tencent Cloud Face Authentication Technology Principles and Best Practices”.

The so-called facial authentication technology, in other words, is real name and real person:

Real name means that your name is legal and valid.

Being a real person means proving that you are you.

In the early days, we all had this experience. For example, you need to be present in person when you go to a bank or operator to open a card. People who are already very old and want to receive pensions need to go to the Social Security Bureau to personally prove their identity. These costs are very high. In addition, online services are already very common nowadays, but it is still very difficult to conduct online identity verification, not to mention the problem of identity fraud and the situation where you encounter an offline inspection but do not bring your ID card. Therefore, whether it is the central bank, operators or the insurance industry, they all advocate the use of OCR technology in business processes to improve efficiency. This is the application background of domestic facial recognition technology.

In terms of liveness detection, the most typical process is to conduct identity verification remotely - the first step is to identify the ID card by OCR, then the system prompts to read the number to prove that it is the person present, and finally record the video to give the final result. During this process, the system will compare the photo. This process will be embedded in many business links, such as ID card update or ID card number change.

In general, liveness verification technology is an evolving process. When liveness algorithms are put into practical use, it is actually a process of compromise between user experience and security. For example, in the early days of action interaction, users were disgusted by it, thinking that this verification mode was particularly stupid. Later, when WeBank started reading numbers, although security was improved, users still did not buy it. This led to the later "Laser Guard" - liveness detection through screen reflection and infrared and 3D structured lighting with higher security levels.

To some extent, the identity verification must also integrate multiple modes to achieve higher security. Even so, it is still inevitable to encounter many "attacks". In this case, it is unrealistic to rely solely on the underlying algorithm. Other solutions that can be considered include security control at the access channel level, back-end risk control, manual review, or a combination of multiple live modes.

- END -

◆ ◆ ◆

Wang Xing said that Jack Ma has integrity issues, Alibaba responded; JD.com responded that "400 people quit in one day"; Apple disclosed more details of Apple Card

Zhou Hongyi launched three new products, targeting middle-aged people who are married and have children and are particularly tired of living.

Meituan was punished for monopoly: forcing merchants to choose between two options; the three AI giants won the 2018 Turing Award; the human sixth sense was confirmed for the first time

Huawei bought 500 acres of land in the UK to build an "optical chip" factory

Shenzhen Airlines App hijacks WeChat; Apple News crashes on its first day; Huawei P30 "telescope phone" officially released

Big news! Apple is experiencing its biggest transformation in history: Apple Card and four new products are unveiled

Latest articles about

■Xiaomi air conditioners are selling like hot cakes. Lu Weibing: A competitor's product that costs 3,000 yuan is sold for 20,000 yuan. Dong Mingzhu is caught in the crossfire. Royole Technology declares bankruptcy. Employees' claims may not be repaid. Zhong Shanshan says he looks down on entrepreneurs who sell goods through live streaming.

■Baidu: Making big model applications more practical

■Dahua Technology joins hands with Hongmeng, is it the direction of the tide or the collision of wisdom?

■Leading the westward expansion of e-commerce, the 150 billionth package will be delivered on Pinduoduo in 2024

■Exclusive: Vipshop Senior Operations Director Fan Li resigns

■Performance exploded! Xiaomi Motors' quarterly revenue sprinted to 10 billion yuan, Lu Weibing said there is no upper limit on the investment in intelligent driving; the widow of the founder of Shanshan Holdings took over from her eldest son as chairman; Zeekr executives called for vigilance against pig-killing scams

■Alibaba Cloud returns to growth track

■Scolding employees and being criticized for being overbearing, Dong Mingzhu: You are so funny, I am the boss; Hycan Auto was exposed to have defaulted on compensation for laid-off employees; Chairman of a state-owned enterprise responded to the high school education of the operations director丨Leifeng Morning News

■1688 is an OEM brand, not following the old path of strict selection

■The Double 11 changes in online retail: Who is driving the direction of the tide?