From the era of text to the era of images, and then to the era of ubiquitous voice today, the popularity and explosion of intelligent voice technology has continuously refreshed people's lifestyles. The emergence of Amazon Echo is the most striking milestone.
The general environment shows that the most widely used applications of intelligent voice technology are still in the fields of intelligent products (smart speakers, robots) and smart homes, and voice recognition technology is the core landing technology of intelligent voice technology. However, it should be mentioned that it is time for some relatively novel landing scenarios to appear in the first stage of the development of intelligent voice technology.
Based on this background, this article will briefly analyze the application of speech recognition technology in the security industry.
The security industry should be an excellent entry point for voice recognition
As artificial intelligence technology empowers various industries, many companies have also shifted their strategies to " AI +". Based on the broad application prospects of the security industry, "AI + security" has quickly become the mainstream melody of the market. As a major branch of artificial intelligence technology, intelligent voice technology naturally needs to "choose a career" and "find a scene" in the security industry, and the first to bear the brunt is voice recognition technology.
Learning to understand each other with machines, that is, human-machine interaction, has always been the core of intelligence in the security industry. As the core landing technology of human-machine interaction, speech recognition technology has also found many footholds in the security industry, mainly used in security robots represented by intelligent inspection robots.
Similar to other service robots that can speak, security robots receive external sounds through built-in microphones and recognize and understand human voices. Once they understand that there is dangerous behavior behind the "human voice", they will automatically trigger the alarm system to enter a defensive state, thereby providing safety protection for the target person.
In addition to security robots, voice recognition technology also plays a key role in the smart hotel scene in the security industry. In the future hotel recently opened by Alibaba, although face recognition is its main technology, the smart robots throughout the entire hotel service process are also indispensable key figures. In Alibaba's future hotel, robots act as the hotel front desk, guiding guests throughout the entire process, and in the hotel room, guests can also improve their accommodation experience by communicating with Tmall Genie. During the check-in process of the future hotel, whether it is a robot acting as the front desk or a waiter Tmall Genie, they all complete human-computer interaction through voice recognition technology, thereby creating a smart interconnected scene anytime and anywhere through this full-stack voice interaction system built with voice recognition technology.
Of course, the application of voice recognition technology in the security industry has also involved multiple intelligent scenarios such as smart finance and smart education.
Alternatively, intelligent voice technology can serve as a "good helper" for "face recognition"
Video surveillance with facial recognition technology as its core is the main application in the security industry. We don’t need to talk much about this, but in the future, can we also use intelligent voice technology to assist facial recognition to make video surveillance more intelligent?
The market is talking about speech recognition technology, but few companies notice that voiceprint recognition and speech emotion recognition also belong to intelligent speech technology.
Voiceprint recognition, also known as speaker identification, converts sound signals into electrical signals and then uses computers to identify them. It can be specifically divided into speaker identification and speaker confirmation. In different scenarios, the choice of voiceprint recognition technology is different. For example, identification technology may be needed to narrow the scope of criminal investigation, while confirmation technology is needed for bank transactions.
Speech emotion recognition is one of the ways of emotion recognition, which refers to the computer automatically identifying the emotional state of input speech. The computer uses sensors to measure and analyze the structural characteristics and distribution laws of language signals with different tones and expressions in terms of time structure, amplitude structure, fundamental frequency structure and formant structure, so as to identify the emotional content implied in all language tones.
Although the recognition rate of current face recognition technology is as high as 99% or even 99.9%, the remaining 1% or even 0.1% is a difficult problem that cannot be solved by current technology. Imagine if voiceprint recognition and voice emotion recognition technology are added to the current video surveillance system with face recognition, the resulting audio-visual fusion technology (lip reading) can predict and identify the thoughts and behaviors of the target audience even if they are silent. Will the current video surveillance system also be upgraded to a new level of intelligence, truly achieving "prevention before it happens"?
It is undeniable that the multimodal interactive system formed by face recognition, voiceprint recognition and language emotion recognition should be able to open up many new application doors in the security industry, such as scene analysis and event detection. In the new round of AI industry transformation, multimodal technology will also become the key to success.
However, there are still difficulties to be solved for intelligent voice to "frame" the security industry
"No voice, no security" sounds like a good vision. Unfortunately, at present, there are still many difficulties to be solved for intelligent voice to "take the lead" in the security industry.
It is widely believed that there are still four "hows" to be solved in the application layout of artificial intelligence in the security industry: how to create scenario-based AI applications to meet user needs? How to build industry intelligent systems to solve practical problems in the industry? How to improve infrastructure, industry standards and security prevention mechanisms? How to build a new intelligent industry ecosystem of mutual benefit and win-win? And these four "hows" are not inconsistent with the application of intelligent voice technology in the security industry.
Far-field speech recognition should be the most critical core technology for intelligent speech recognition in the security industry, but this technology still has three major technical bottlenecks: echo, noise and reverberation. The most intuitive example is that when security robots perform security work in public areas, they receive too many voice signals and are unable to separate the target voice, making it impossible to perform normal recognition.
Another example is the language emotion recognition technology mentioned above. In fact, it is much more difficult to characterize emotions in speech than facial expressions, because facial expression signals convey personal characteristics and expressions, but not language information, while speech signals are mixed information, including speaker characteristics, emotions, and vocabulary and grammar emphasized in the speech content. It requires much more data for training and learning than face recognition.
In addition to the technical difficulties of far-field speech recognition and language emotion recognition, intelligent voice technology itself still has many problems that have not been overcome, including accents, target speaker separation, multi-language mixing, efficient migration and data iteration, industry standards and attack defense, etc., which leads to its application not only in the security industry, but also in various industries at this stage. It seems that it would be more appropriate to describe it as "artificial intelligence".
summary:
The industry generally believes that AI is not just a show of skill, but a real way to promote technological innovation and solve industry problems. Today, as artificial intelligence technology enters large-scale applications, it is even more necessary to properly balance the relationship between "career selection" and "scene selection" to differentiate from homogeneity.
How to break the technical bottleneck and empower all walks of life in the era of artificial intelligence? The four solutions proposed by Liang Jiaran, Chairman and CTO of Yunzhisheng, may be more rational thinking: solving the problems of deep learning in industrial-scale applications, solving the problems of non-big data, end-to-end, and sequence mapping, effectively combining data and knowledge to form an efficient iterative closed loop, and fundamentally improving the machine's cognitive and learning capabilities.
In 2019, artificial intelligence technology has gradually returned to rationality, and more and more problems have begun to emerge. But for the industry, it is the worst time and the best time.
Previous article:New hacked USB cable lets attackers launch remote attacks over WiFi
Next article:With advantages in both security and privacy protection, the pace of voiceprint recognition industrialization is gradually accelerating
- Popular Resources
- Popular amplifiers
- These exhibits at the Zhuhai Air Show are eye-catching
- Mir T527 series core board, high-performance vehicle video surveillance, departmental standard all-in-one solution
- Akamai Expands Control Over Media Platforms with New Video Workflow Capabilities
- Tsinghua Unigroup launches the world's first open architecture security chip E450R, which has obtained the National Security Level 2 Certification
- Pickering exhibits a variety of modular signal switches and simulation solutions at the Defense Electronics Show
- Parker Hannifin Launches Service Master COMPACT Measuring Device for Field Monitoring and Diagnostics
- Connection and distance: A new trend in security cameras - Wi-Fi HaLow brings longer transmission distance and lower power consumption
- Smartway made a strong appearance at the 2023 CPSE Expo with a number of blockbuster products
- Dual-wheel drive, Intellifusion launches 12TOPS edge vision SoC
- Intel promotes AI with multi-dimensional efforts in technology, application, and ecology
- ChinaJoy Qualcomm Snapdragon Theme Pavilion takes you to experience the new changes in digital entertainment in the 5G era
- Infineon's latest generation IGBT technology platform enables precise control of speed and position
- Two test methods for LED lighting life
- Don't Let Lightning Induced Surges Scare You
- Application of brushless motor controller ML4425/4426
- Easy identification of LED power supply quality
- World's first integrated photovoltaic solar system completed in Israel
- Sliding window mean filter for avr microcontroller AD conversion
- What does call mean in the detailed explanation of ABB robot programming instructions?
- STMicroelectronics discloses its 2027-2028 financial model and path to achieve its 2030 goals
- 2024 China Automotive Charging and Battery Swapping Ecosystem Conference held in Taiyuan
- State-owned enterprises team up to invest in solid-state battery giant
- The evolution of electronic and electrical architecture is accelerating
- The first! National Automotive Chip Quality Inspection Center established
- BYD releases self-developed automotive chip using 4nm process, with a running score of up to 1.15 million
- GEODNET launches GEO-PULSE, a car GPS navigation device
- Should Chinese car companies develop their own high-computing chips?
- Infineon and Siemens combine embedded automotive software platform with microcontrollers to provide the necessary functions for next-generation SDVs
- Continental launches invisible biometric sensor display to monitor passengers' vital signs
- Laboratory monitoring system based on Arduino and Gizwits
- EEWORLD University ---- Industrial Control PLC Series Courses
- [RISC-V MCU CH32V103 Review] ---Advanced Wiki---Brief Analysis of USB Disk Enumeration Code
- The Story of Fourier Transform
- [Evaluation of domestic FPGA Gaoyun GW1N-4 series development board] Unboxing + lighting
- Application of power amplifier in the study of electrokinetic transport-capture-release performance of Pb-contaminated municipal sludge
- parallel computing
- Have you ever experienced an unmanned supermarket using RF technology?
- IC open drain output pin Hiz
- NODEMCU-32-S2 development board similar to ESP32-S2-Saola-1