The technology that OpenAI has hidden for more than a year is officially released! 15-second material clone sound, HeyGen also uses it

Latest update time：2024-03-30

Reads：

Cressy comes from Aofei Temple
Qubits | Public account QbitAI

OpenAI’s hidden new product, the speech synthesis engine Voice Engine, has finally been officially unveiled.

With it, it only takes 15 seconds of voice samples to clone a person's voice , and it can span languages!

The voice conversation function in the APP version of ChatGPT is also driven by this technology .

How's the effect? Let’s listen to the DEMO first:

Salt also makes sure we stay hydrated which means there is enough water in our body for it to properly function. Salt also makes sure we stay hydrated which means there is enough water in our body for it to properly function
.

The OpenAI announcement shows that they have developed this technology at the end of 2022, but it has not been officially released due to security concerns.

This time, OpenAI finally officially announced the Voice Engine and demonstrated several application cases in small-scale testing.

For example, a nonprofit medical organization used the technology to restore a young patient's voice.

It is also worth mentioning that HeyGen, the popular video translation software last year, uses the Voice Engine.

So, let’s take a look at what other effects OpenAI has shown this time.

Using AI to help patients regain their voices

The first is to use basic speech synthesis skills to provide reading assistance for groups such as children who do not have the ability to read text .

For example, a children's education technology company has been using Voice Engine to dub prepared voice-over content.

The large sections of content generated in the DEMO are all based on this 15-second sample:

Then, you can synthesize long speech segments of the same timbre:

Let’s take a look at the speech translation technology used in HeyGen. The original material is an English audio:

It has been translated into Mandarin, French, German and other languages using its original sound.

Ignoring the quality of the translation and just listening to the sound, the Chinese effect is like this:

The timbre is pretty good, but the accent is obviously like a foreigner speaking Chinese.

As for whether this is a bug or a feature, it's a matter of opinion.

In addition, an auxiliary application for people with disabilities called Livox also uses Voice Engine to "make sounds" for people with disabilities who cannot speak -

With the Voice Engine, TAs can choose exclusive real-person sounds instead of mechanically synthesized sounds, and the consistency of the sounds can be maintained across various languages.

Not only does it help people with disabilities have their own voices, Voice Engine can also restore the voices before the illness for people whose voices have undergone major changes due to illness. This can be achieved as long as there are previous voice samples.

A young patient suffered from a vascular brain tumor and lost the ability to speak fluently. His speech became like this:

Doctors extracted her pre-illness voice samples from videos recorded at her school, and restored her previous timbre with the help of Voice Engine.

The case released this time, especially the scene of providing help to the inconvenienced people, has received a lot of praise, but some netizens have also expressed concerns about the abuse of this technology.

Safety issues require the common attention of the whole society

In fact, security issues are also the main consideration for OpenAI's delay in making this technology public.

For security reasons, the developers in the previous cases have been strictly screened by OpenAI and need to promise to abide by the usage agreement.

These developers were required to clearly state that the voices were synthesized and blacklists were set up to prevent the cloning of public figures' voices.

In addition, OpenAI also added watermarks to the synthesized sounds so that they can be detected and monitored when problems arise, and called on people to take measures to jointly deal with this problem:

Phase out voice verification methods in security verification measures for sensitive information such as banks
Explore measures to protect personal voices in the AI era
Educate the public about the limitations of AI and the potential for it to be used for fraud
Accelerate the development of tracking and tracing technology so that people can clearly distinguish between real people and AI

Reference link:
https://openai.com/blog/navigating-the-challenges-and-opportunities-of-synthetic-voices

-over-

Registration for the selection is about to close!

AIGC companies & products worthy of attention in 2024

Qubits is selecting the most noteworthy AIGC companies in 2024 and the most anticipated AIGC products in 2024. Welcome to register for the selection ! Registration for selection ends March 31, 2024

China AIGC Industry Summit "Hello, New Application!" Registration has opened! Click to register and attend. At the same time, the summit will be broadcast live online⬇️

Click here ???? Follow me and remember to mark it with a star