If there was a technology that could copy or imitate your speech in a second, would you be surprised or horrified?
In 2019, the application of AI technology has become more and more diverse. Technology companies such as iFlytek and Sogou have successively released applications of speech synthesis technology. Through AI, users can change their voices to those of celebrities or other people they want to imitate in one second.
Internet technology is quietly changing our lives. For artificial intelligence companies, the widespread application of voice recognition technology is no longer a difficult task. However, the ethics and security risks behind it may become an issue that cannot be ignored as AI technology develops.
Real-time voice-changing technology is popular among AI companies, and voice can be changed in one second
"Hi, everyone. I'm very happy today to be here at iFlytek's new product launch. I've always liked iFlytek..."
This happened at iFlytek's 2019 new product launch. iFlytek Chairman Liu Qingfeng used technology to simulate the voices of Shan Tianfang, Lin Chiling and Luo Yonghao to make the opening remarks. Especially when Luo Yonghao's voice sounded, many people thought that Luo Yonghao was there.
"You see Liu Qingfeng, but you hear Lao Luo's voice." Liu Qingfeng said on the stage that this is the company's latest real-time voice changing technology. It is reported that this new speech synthesis technology only needs a 1-minute voice sample to imitate anyone's speech.
Not only iFlytek, recently, Sogou CEO Wang Xiaochuan demonstrated Sogou's voice-changing function at a conference. Through the mobile phone software, Wang Xiaochuan simulated the voices of Gao Xiaosong and a girl from Northeast China, which made the audience laugh. He then demonstrated the voice replacement in the song. It was introduced that the system first trained his voice for 14 minutes and then migrated the timbre.
This is Sogou's latest speech synthesis technology, which can transform anyone's voice into a specific voice, such as Lin Chiling or Jack Ma's voice in seconds. Wang Xiaochuan said that this is not just a simple speech synthesis, but can also transfer the voice, tone and emotion.
Currently, in Sogou Input Method, users can freely change their voices into their favorite voices, which can be used in major social scenarios such as WeChat, QQ, Momo, etc. Sogou provides 19 specific voices in several categories such as celebrities, cartoon characters, game IPs, and dialects.
Wang Xiaochuan
In fact, speech synthesis is no longer a new technology. Before, we saw more of the conversion of text into sound, such as in navigation, transcription, smart speakers, Siri and other smart voice assistants, etc., not real people speaking.
This year, many AI companies have focused on the application of speech synthesis in scenarios such as voice changing and voice cosplay, converting the voices of real people into specific sounds.
Baidu also has practical applications of related technologies. In early May this year, in the CCTV public welfare program "Wait for Me", Baidu Brain, based on intelligent voice technology, synthesized the voice of a deceased veteran, helping old comrades who had been separated for 64 years to "reunite".
According to reports, the technology uses Baidu's end-to-end speech style separation and modeling solution, and uses multiple sets of neural networks to independently encode and model different dimensions of speech, such as timbre, emotion, style, etc., to guide the final synthesis.
The application of these AI technologies reflects the progress of AI technology applications and the concept of inclusive value brought to society. For example, Sogou combines voice changing technology, AI synthetic anchor technology and other industries in media, education, content production, tourism and other scenarios, which will bring greater value imagination space.
On the other hand, the risks of technical loopholes and technology abuse in the future cannot be ignored. Some netizens pointed out that "be careful of being used for telecom fraud" and "you may receive a call from 'Jack Ma' in the future"...
An industry insider in the audio field believes that it should be useful for tool-type products that use audio as a method of interaction, but its positive significance for online audio platforms that use audio as a content carrier remains to be seen.
Therefore, for enterprises, while constantly seeking technological breakthroughs and commercial value, they should also establish a sense of responsibility for technological security.
Speech synthesis technology still has many shortcomings in practice
It is understood that the realistic speech synthesis technology is supported by neural networks and machine learning. Neural networks simulate the transmission process of electrical signals between neurons in the human brain and process input data. They use layered neurons to summarize common features from a large amount of sample data.
In terms of commercialization, speech synthesis technology can be seen being applied in fields such as voice interaction, audiobooks, new media, intelligent customer service, and pan-entertainment.
In an interview, Niu Sen, head of the education category at蜻蜓FM, said that speech synthesis technology in the audio field will greatly reduce the personnel, time and economic costs of converting text content into audio.
When talking about voice cosplay, Niu Sen pointed out that this has many practical flaws. For example, the synthesized audio and the real human voice cannot be completely consistent in terms of emotions and emotional expression.
He said that for audio users, the listening experience of reading a script and narrating the same content would be very different. Only the most authentic human voice can trigger a deep emotional resonance, which is also the value of audio.
On the ethical and safety level, Niu Sen believes that human voices and synthesized sounds must first be screened and confirmed from a technical perspective, and the copyright chain needs to be clarified from a rights perspective. Any unauthorized synthesized audio constitutes an infringement and illegal act. "As a platform, we will conduct strict copyright and quality control."
It is understood that on some audio platforms, speech synthesis technology is mainly used for children's programs. For other content, the AI simulation effect is not so good and has not yet been widely used.
Regarding the security risks of speech synthesis, after the release of the voice-changing technology, Liu Qingfeng emphasized on the spot: For artificial intelligence to continue to develop, the most important thing is how its values should be positive, healthy and kind to people. Therefore, we will obviously not easily open up a black technology like voice-changing technology to the outside world in various apps. There must be a healthy, safe and interesting way to connect with the world.
Previously, Liu Qingfeng also mentioned that the field of artificial intelligence requires not only technical cooperation, but also legal and ethical cooperation.
Regarding security issues, Sogou told Sina Technology that "Technology is a double-edged sword. It can be used for good or bring disaster. Sogou is committed to using technology for good. Voice changing technology is a cutting-edge application of artificial intelligence. Based on speech representation learning and transfer learning technology, it can convert anyone's voice into a specific person's voice (Any-to-One). Sogou has made a breakthrough in this regard and is the first to enter the practical stage. This technology can also be applied to film and television dubbing, family companionship and other scenarios to help people improve work efficiency and happiness in life."
Sogou revealed that in order to ensure that this technology is not abused by those with ulterior motives, the company has implemented strict management and restrictions:
1. Sogou does not export voice changing technology to third parties to ensure the controllability and security of this technology.
2. All target timbres of the voice changing function are defined by Sogou and do not support users' random imitation.
3. The changed voice can be used in apps such as WeChat and QQ. It cannot be forwarded or copied, and the sender can be tracked.
Previously, Wang Xiaochuan also mentioned AI legislation in a media interview: At the current stage of AI development, continuous adjustment and improvement based on technological development as quickly as possible is the most practical means to deal with the legal and ethical risks brought about by AI.
However, the current development of technology is still ahead of ethics and law. Zhou Hongyi mentioned at the World Intelligence Conference in May this year that in the field of AI, if there is no humanistic thinking, the system designed may be a tragedy.
Humanistic thinking behind AI technology
In fact, the "fake and real" phenomenon behind AI technology not only appears in the field of sound, but a technology application by Samsung has also attracted people's attention recently.
According to foreign media reports, researchers at Samsung's artificial intelligence laboratory in Moscow, based on a large amount of animated images and video materials, as well as "deep convolutional neural network" training, use AI technology to accurately identify certain facial features and can turn still images into animated images or even videos.
In the experiment, the researchers used still images of Einstein, Marilyn Monroe and even the Mona Lisa to generate videos of them speaking, but the video quality is currently low.
In other words, with the advancement of AI image generation technology in the future, fake videos can be generated with just one photo.
Before this, AI face-swapping had also caused heated discussions on social media. Someone replaced the Huang Rong played by Athena Chu in the 1994 version of The Legend of the Condor Heroes with Yang Mi's face, and netizens said that it was "perfectly harmonious" and "indistinguishable from the real thing", and even joked that it was "the most cost-effective way to reshoot an old drama".
Previous article:Technology advances, the past is forgotten
Next article:Detailed interpretation of the "division of labor" of semiconductor equipment
- Popular Resources
- Popular amplifiers
- Huawei's Strategic Department Director Gai Gang: The cumulative installed base of open source Euler operating system exceeds 10 million sets
- Analysis of the application of several common contact parts in high-voltage connectors of new energy vehicles
- Wiring harness durability test and contact voltage drop test method
- Sn-doped CuO nanostructure-based ethanol gas sensor for real-time drunk driving detection in vehicles
- Design considerations for automotive battery wiring harness
- Do you know all the various motors commonly used in automotive electronics?
- What are the functions of the Internet of Vehicles? What are the uses and benefits of the Internet of Vehicles?
- Power Inverter - A critical safety system for electric vehicles
- Analysis of the information security mechanism of AUTOSAR, the automotive embedded software framework
Professor at Beihang University, dedicated to promoting microcontrollers and embedded systems for over 20 years.
- Innolux's intelligent steer-by-wire solution makes cars smarter and safer
- 8051 MCU - Parity Check
- How to efficiently balance the sensitivity of tactile sensing interfaces
- What should I do if the servo motor shakes? What causes the servo motor to shake quickly?
- 【Brushless Motor】Analysis of three-phase BLDC motor and sharing of two popular development boards
- Midea Industrial Technology's subsidiaries Clou Electronics and Hekang New Energy jointly appeared at the Munich Battery Energy Storage Exhibition and Solar Energy Exhibition
- Guoxin Sichen | Application of ferroelectric memory PB85RS2MC in power battery management, with a capacity of 2M
- Analysis of common faults of frequency converter
- In a head-on competition with Qualcomm, what kind of cockpit products has Intel come up with?
- Dalian Rongke's all-vanadium liquid flow battery energy storage equipment industrialization project has entered the sprint stage before production
- Allegro MicroSystems Introduces Advanced Magnetic and Inductive Position Sensing Solutions at Electronica 2024
- Car key in the left hand, liveness detection radar in the right hand, UWB is imperative for cars!
- After a decade of rapid development, domestic CIS has entered the market
- Aegis Dagger Battery + Thor EM-i Super Hybrid, Geely New Energy has thrown out two "king bombs"
- A brief discussion on functional safety - fault, error, and failure
- In the smart car 2.0 cycle, these core industry chains are facing major opportunities!
- Rambus Launches Industry's First HBM 4 Controller IP: What Are the Technical Details Behind It?
- The United States and Japan are developing new batteries. CATL faces challenges? How should China's new energy battery industry respond?
- Murata launches high-precision 6-axis inertial sensor for automobiles
- Ford patents pre-charge alarm to help save costs and respond to emergencies
- 2021 ON Semiconductor Avnet RSL10 Bluetooth SoC Development and Design Competition Third Post (Revised Routine)
- Frequency Converter Application in Baosteel
- Today at 10:00 AM, live broadcast with prizes: ams projection lighting (MLA) enhances communication between cars and roads
- MSP430F5529 clock multiplier setting is effective
- [RVB2601 Creative Application Development] + Unboxing
- MSP430F5538A watchdog
- I also shared the books I bought with E coins.
- [Qinheng Trial] 1. CH549EVT Product Display
- Linear relationship, linear region
- What are the necessary instructions for SIM800C transparent transmission mode?