Application prospects of intelligent voice technology in the security industry-EEWORLD

Collect

From the era of text to the era of images, and then to the era of ubiquitous voice today, the popularity and explosion of intelligent voice technology has continuously refreshed people's lifestyles. The emergence of Amazon Echo is the most striking milestone.

The general environment shows that the most widely used applications of intelligent voice technology are still in the fields of intelligent products (smart speakers, robots) and smart homes, and voice recognition technology is the core landing technology of intelligent voice technology. However, it should be mentioned that it is time for some relatively novel landing scenarios to appear in the first stage of the development of intelligent voice technology.

Based on this background, this article will briefly analyze the application of speech recognition technology in the security industry.

The security industry should be an excellent entry point for voice recognition

As artificial intelligence technology empowers various industries, many companies have also shifted their strategies to " AI +". Based on the broad application prospects of the security industry, "AI + security" has quickly become the mainstream melody of the market. As a major branch of artificial intelligence technology, intelligent voice technology naturally needs to "choose a career" and "find a scene" in the security industry, and the first to bear the brunt is voice recognition technology.

Learning to understand each other with machines, that is, human-machine interaction, has always been the core of intelligence in the security industry. As the core landing technology of human-machine interaction, speech recognition technology has also found many footholds in the security industry, mainly used in security robots represented by intelligent inspection robots.

Similar to other service robots that can speak, security robots receive external sounds through built-in microphones and recognize and understand human voices. Once they understand that there is dangerous behavior behind the "human voice", they will automatically trigger the alarm system to enter a defensive state, thereby providing safety protection for the target person.

In addition to security robots, voice recognition technology also plays a key role in the smart hotel scene in the security industry. In the future hotel recently opened by Alibaba, although face recognition is its main technology, the smart robots throughout the entire hotel service process are also indispensable key figures. In Alibaba's future hotel, robots act as the hotel front desk, guiding guests throughout the entire process, and in the hotel room, guests can also improve their accommodation experience by communicating with Tmall Genie. During the check-in process of the future hotel, whether it is a robot acting as the front desk or a waiter Tmall Genie, they all complete human-computer interaction through voice recognition technology, thereby creating a smart interconnected scene anytime and anywhere through this full-stack voice interaction system built with voice recognition technology.

Of course, the application of voice recognition technology in the security industry has also involved multiple intelligent scenarios such as smart finance and smart education.

Alternatively, intelligent voice technology can serve as a "good helper" for "face recognition"

Video surveillance with facial recognition technology as its core is the main application in the security industry. We don’t need to talk much about this, but in the future, can we also use intelligent voice technology to assist facial recognition to make video surveillance more intelligent?

The market is talking about speech recognition technology, but few companies notice that voiceprint recognition and speech emotion recognition also belong to intelligent speech technology.

Voiceprint recognition, also known as speaker identification, converts sound signals into electrical signals and then uses computers to identify them. It can be specifically divided into speaker identification and speaker confirmation. In different scenarios, the choice of voiceprint recognition technology is different. For example, identification technology may be needed to narrow the scope of criminal investigation, while confirmation technology is needed for bank transactions.

Speech emotion recognition is one of the ways of emotion recognition, which refers to the computer automatically identifying the emotional state of input speech. The computer uses sensors to measure and analyze the structural characteristics and distribution laws of language signals with different tones and expressions in terms of time structure, amplitude structure, fundamental frequency structure and formant structure, so as to identify the emotional content implied in all language tones.

Although the recognition rate of current face recognition technology is as high as 99% or even 99.9%, the remaining 1% or even 0.1% is a difficult problem that cannot be solved by current technology. Imagine if voiceprint recognition and voice emotion recognition technology are added to the current video surveillance system with face recognition, the resulting audio-visual fusion technology (lip reading) can predict and identify the thoughts and behaviors of the target audience even if they are silent. Will the current video surveillance system also be upgraded to a new level of intelligence, truly achieving "prevention before it happens"?

It is undeniable that the multimodal interactive system formed by face recognition, voiceprint recognition and language emotion recognition should be able to open up many new application doors in the security industry, such as scene analysis and event detection. In the new round of AI industry transformation, multimodal technology will also become the key to success.

However, there are still difficulties to be solved for intelligent voice to "frame" the security industry

"No voice, no security" sounds like a good vision. Unfortunately, at present, there are still many difficulties to be solved for intelligent voice to "take the lead" in the security industry.

It is widely believed that there are still four "hows" to be solved in the application layout of artificial intelligence in the security industry: how to create scenario-based AI applications to meet user needs? How to build industry intelligent systems to solve practical problems in the industry? How to improve infrastructure, industry standards and security prevention mechanisms? How to build a new intelligent industry ecosystem of mutual benefit and win-win? And these four "hows" are not inconsistent with the application of intelligent voice technology in the security industry.

Far-field speech recognition should be the most critical core technology for intelligent speech recognition in the security industry, but this technology still has three major technical bottlenecks: echo, noise and reverberation. The most intuitive example is that when security robots perform security work in public areas, they receive too many voice signals and are unable to separate the target voice, making it impossible to perform normal recognition.

Another example is the language emotion recognition technology mentioned above. In fact, it is much more difficult to characterize emotions in speech than facial expressions, because facial expression signals convey personal characteristics and expressions, but not language information, while speech signals are mixed information, including speaker characteristics, emotions, and vocabulary and grammar emphasized in the speech content. It requires much more data for training and learning than face recognition.

In addition to the technical difficulties of far-field speech recognition and language emotion recognition, intelligent voice technology itself still has many problems that have not been overcome, including accents, target speaker separation, multi-language mixing, efficient migration and data iteration, industry standards and attack defense, etc., which leads to its application not only in the security industry, but also in various industries at this stage. It seems that it would be more appropriate to describe it as "artificial intelligence".

summary:

The industry generally believes that AI is not just a show of skill, but a real way to promote technological innovation and solve industry problems. Today, as artificial intelligence technology enters large-scale applications, it is even more necessary to properly balance the relationship between "career selection" and "scene selection" to differentiate from homogeneity.

How to break the technical bottleneck and empower all walks of life in the era of artificial intelligence? The four solutions proposed by Liang Jiaran, Chairman and CTO of Yunzhisheng, may be more rational thinking: solving the problems of deep learning in industrial-scale applications, solving the problems of non-big data, end-to-end, and sequence mapping, effectively combining data and knowledge to form an efficient iterative closed loop, and fundamentally improving the machine's cognitive and learning capabilities.

In 2019, artificial intelligence technology has gradually returned to rationality, and more and more problems have begun to emerge. But for the industry, it is the worst time and the best time.

Reference address：Application prospects of intelligent voice technology in the security industry

Previous article：New hacked USB cable lets attackers launch remote attacks over WiFi
Next article：With advantages in both security and privacy protection, the pace of voiceprint recognition industrialization is gradually accelerating

Popular Resources
Popular amplifiers