What core technology does the AI robot sales call company exposed by CCTV 315 use?
▲Click above Leifeng.com Follow
The early telecommunications industry's "call you to death" and number-changing software black market have evolved into today's AI robot harassment calls.
Text | Zhao Chenxi
The annual CCTV 315 "evening party" is the most "nervous" moment of the year for enterprises. Last night, the CCTV 315 column team exposed the violations in many industries. Medical waste, dangerous spicy strips, the tricks of local eggs, unhygienic sanitary products, many routines of home appliance after-sales service, etc. The industrial chain behind it is huge and shocking. Among them, the exposure of harassing calls from smart robots has attracted the attention of many people.
Because everyone receives various sales calls in daily life. Real estate, bank loans, training institutions, education, cars, etc. However, most people may not know that the person making the marketing call may not be a real person, but an AI robot. First, the probe box identifies the mobile phone connected to the wireless network. Then, the user's private MAC information is obtained without the user's knowledge. The MAC is then converted into a mobile phone number and "matched" with the big data. Then, an AI robot that simulates a human is used to make outbound calls.
These probe boxes are widely distributed in public places such as shopping malls, supermarkets, office buildings, convenience stores, etc., and are very concealed. CCTV exposed a number of companies. The entire industry chain includes intelligent robot harassment calls + big data marketing + probe boxes. The specific companies are:
-
Yige Technology Co., Ltd.
-
Shaanxi Yilongxinke Artificial Intelligence Technology Co., Ltd.
-
Zhongke Zhilian Technology Co., Ltd.
-
Biho Technology Co., Ltd.
-
Shengya Technology Co., Ltd.
-
Samoyed Internet Financial Technology Co., Ltd.
-
Shenzhen Miaodi Technology Co., Ltd.
-
Shanghai Zhizi Information Technology Co., Ltd.
-
Lingwo Network Technology Co., Ltd.
-
Fortune Technology Co., Ltd.
-
Hangzhou DiJin Network Technology Co., Ltd.
CCTV's 315 program introduced that a company can make more than 4 billion calls a year. In the telecommunications industry, "nuisance calls" have never been eradicated. Behind it are network security, communication networks of different operators, Internet access to communication networks, caller and called party responsibilities, and many other aspects. In recent years, with the continuous emergence and iteration of emerging technologies, the early communications industry's "call to death" and number-changing software black industry have evolved into today's AI robot nuisance calls, and the technology has been constantly upgraded.
Analysis of similar cases abroad
Do you remember the 2018 Google I/O, the annual developer conference held by Google in California in 2018? In addition to many new products such as Android P, Gmail, Gboard, TPUv3, etc., Google's personal assistant Google Assistant has added Duplex, which can call restaurants, hair salons and other commercial stores to help users make appointments.
From the demonstration cases at the conference, we can see that Duplex can not only communicate with humans in a natural and fluent voice without being noticed, but also successfully handle unexpected situations. For example, it can respond to the auxiliary words "emm" and "uha", understand the context of the conversation, and has the function of actively providing corpus. Of course, Google is not the only company in the world that has achieved this magical "effect".
Subsequently, Microsoft also stood up and issued a technical statement:
The significance of full-duplex voice technology is that it can transform "human-computer interaction" into "human-computer communication." The difference of one word has huge value.
On April 4 this year, we officially released Full Duplex Sensory in the United States and China simultaneously, and predicted that the industry will realize the value of this technology and accelerate its focus in this direction. We are very happy to see more and more industry peers joining us.
In fact, the first full-duplex voice call with artificial intelligence in human history did not happen in the United States, but in China. We are honored to dedicate this crown to our motherland. Since August 2016, Microsoft (Asia) Internet Engineering Academy has enabled XiaoIce to complete more than 600,000 calls with human users through human users' initiative.
Today, we are releasing an actual recording of a phone call that took place two years ago, and will dedicate it as precious material to Chinese people who speak Chinese all over the world.
The core technology behind Google Duplex: It is actually an RNN network built by TensorFlow Extended (RFX). In order to achieve high accuracy, Google trained Duplex's RNN network with anonymous phone conversation data. This network uses the recognition result text of Google's automatic speech recognition (ASR), as well as features in the audio, conversation history, conversation parameters (such as the service to be booked, the current time), etc. Google trained different understanding models for each different task, but some training corpora are shared between different tasks. Finally, Google also used TFX's hyperparameter optimization to further improve the model.
The input speech is first processed by the automatic speech recognition system (ASR), and the generated text is input into the RNN network together with the context data and other inputs. The generated response text is then read out through the text-to-speech (TTS) system.
Google uses a cascade TTS engine and a generative TTS engine (which uses Tacotron and WaveNet) to control the intonation of the voice according to different situations. The system can also generate some modal particles (such as "hmmm" and "uh"), which also makes the voice more natural.
When cascade TTS needs to combine speech units that vary greatly, or needs to increase the generated pauses, modal particles are added to the generated speech, allowing the system to indicate to the other party in a natural way "Yes, I'm listening" or "I'm still thinking about it" (humans often use modal particles while thinking when speaking). Google's user surveys also confirmed that humans find conversations with modal particles more familiar and natural. On the other hand, the system's latency must also be able to match the characteristics of human speech. In some cases, the system even uses a fast approximation model, allowing the system to achieve a latency of less than 100ms.
From Microsoft's technical statement, it can also be seen that Microsoft's so-called full-duplex voice interaction technology Full-Deplex Voice and Google's Duplex should be extremely similar in technology. However, the generation model used by Microsoft is LSTM, while Google uses RNN network.
As Microsoft said, "In fact, the first full-duplex voice call with artificial intelligence in human history did not happen in the United States, but in China." Whether it is the application scenarios of Google or Microsoft, it can be seen that the initial purpose of studying "human-computer communication" is good, that is, to free people from single, simple, and unskilled labor. However, at present, domestic full-duplex voice calls based on artificial intelligence are used in gray areas by some companies, resulting in the "flooding" of harassing calls. So, what technologies do these companies exposed in China use?
Experts explain the technology and ethical standards behind it
To this end, Leifeng.com interviewed Wang Shijin, deputy director of iFlytek AI Research Institute. Wang Shijin told Leifeng.com that AI conversational robots are a type of human-computer interaction system mainly used in service scenarios. Its background mainly involves multiple AI core technologies such as speech recognition, semantic understanding, conversational question and answer, speech synthesis, knowledge graph, etc. In addition, it also requires engineering technology support such as process control, telephone exchange platform, and communication lines.
Telephone is a typical human-computer interaction application scenario, in addition to WeChat, web pages, APP and other scenarios. The interaction in the telephone scenario is real-time two-way interaction, and because the audio quality of the telephone channel is relatively poor and the information carrier is single, its technical complexity is generally high.
These companies exposed in China generally do not have core AI technology, and their system backends often call on the open platform capabilities of other AI companies. From a technical point of view, the intelligent voice technology used by telemarketing robots is very basic, mainly converting the original human speech into a computer broadcast, and calling some simple voice recognition technology.
However, these companies often choose to record their own voices instead of using them, which is not smart, but simpler and cheaper. At present, Google, Microsoft, and domestic companies such as iFlytek and Alibaba have relatively comprehensive core AI technology capabilities, and telephone conversation robots are also a typical application of these capabilities.
iFlytek's current telephone robot technology is mainly used in scenarios such as industry customer service, telephone ordering, and logistics ordering. It focuses on solving problems in the field of intelligent services, improving efficiency, and reducing costs, and has significant application value. For customers who actually purchase services, iFlytek states in the agreement that outgoing calls cannot be used for illegal purposes such as "nuisance calls". Once discovered, the service will be terminated immediately. After inquiries, many telephone sales robot companies on the market that claim to "use iFlytek's services" were found to be not iFlytek's customers, but just using iFlytek's name.
China's economy is developing rapidly, and society and the public are relatively tolerant of the application of emerging technologies. Therefore, driven by commercial interests, it is relatively easy for some ethical issues in technology application to arise. We believe that telemarketing robots that specifically make "nuisance calls" are not a technical issue, but a social ethical issue.
If AI technology is compared to a weapon, the final effect of it depends on who uses it and how it is used. In pursuit of commercial interests, the interests of some people should not be harmed, including commercial interests and other rights of personal privacy. We should pursue a win-win business logic. This requires society and the industry to jointly advocate the concept of value creation and strengthen regulation and supervision through more laws and regulations.
In November last year, the Ministry of Industry and Information Technology announced the "Work Plan for Promoting the Special Action of Comprehensive Rectification of Nuisance Calls", which severely rectified the problem of nuisance calls and made strict regulations. With the rapid development and application of artificial intelligence technology, the availability of telephone conversation robots has been greatly improved. They have been rapidly applied in many fields such as intelligent services, finance, logistics, and medical care, and have also produced huge social and economic benefits.
Wang Shijin believes that this system should be used first in service communication scenarios where there is a lot of manual repetitive work, so as to free up people's energy to do more valuable things. For example, customer service or consulting services in the fields of intelligent services, finance, education, and medical care, such as the confirmation of information between couriers and customers when delivering packages, and routine follow-up visits to patients by hospitals or communities.
Summarize
Leifeng.com believes that artificial intelligence is not only a science and an industry, but also involves all aspects of social life. It is very likely to change the employment structure, impact the law and social ethics, infringe on personal privacy, and challenge the norms of international relations. The security risks and challenges involved, how to develop safely, reliably and controllably in the future, and the ethical constraints behind it have always been a concern of countries around the world.
During the two sessions this year, Baidu CEO Robin Li also proposed that from the perspective of society, government and the public, we need to consider what should be done and what should not be done, what is good and what is bad in the development of artificial intelligence technology. We should make some regulations and predictions as soon as possible to avoid the development of artificial intelligence in a bad direction.
- END -
◆ ◆ ◆
Recommended Reading
Multiple pictures! CCTV 315 exposes the chaos of robots making harassing phone calls: 4 billion calls a year, these AI companies are on the list
Baidu launches senior executive retirement plan, Zhang Yaqin will retire in October
Xiaomi urgently halts sales of the Xiaomi Mi 9 series; Apple responds to AirPods causing cancer; WWDC 2019 conference date confirmed
Trump orders the US to ground Boeing 737Max; Mobike’s withdrawal from Singapore confirmed; Ma Huateng personally comments on the “Lulu incident”
Follow Leiphone.com (leiphone-sz) Reply 2 Add readers group to make friends