Geely releases a new generation of speech synthesis model with voice cloning capabilities

Publisher:科技创新实践者Latest update time:2024-04-13 Reading articles on mobile phones Scan QR code
Read articles on your mobile phone anytime, anywhere

Recently, Geely officially announced that the StarRui AI big model has achieved another breakthrough technical achievement - the "new generation HAM-TTS speech synthesis big model" has been officially released. According to the official introduction, the new generation HAM-TTS speech synthesis big model innovatively introduces the text acoustic information prediction module, which can synthesize natural, smooth and emotional speech based on the given text. At the same time, it has a powerful voice cloning ability, and can reproduce realistic voices with only a few seconds of reference voice samples, giving users a real and vivid voice interaction experience.

According to official introduction, the new generation of HAM-TTS speech synthesis model has taken the lead in breaking through the data collection problem, expanding the amount of training data to over 650,000 hours and the number of parameters to 800 million. In addition, Geely has also adopted a clever data enhancement strategy. That is, artificially setting "noise" in the training data through splicing and replacement, so as to improve the speech synthesis model's ability to recognize timbre, making the synthesized audio timbre more stable, more coherent, and closer to human voice.


At the same time, the new generation of HAM-TTS speech synthesis model also has powerful cross-language switching capabilities. Moreover, the new generation of HAM-TTS speech synthesis model can intelligently adjust multi-dimensional parameters such as tone, intonation, pauses and emotions according to specific scene requirements.


On January 11, 2024, Geely officially released the Star Rui AI Big Model. Geely Star Rui AI Big Model uses the powerful Star Rui Intelligent Computing Center as its computing power base, deeply integrating the self-developed basic big model with Geely's NPDS R&D system and a massive car-making full-link scenario database. It will become a big model with rich application scenarios in the automotive industry, powerful computing power, a complete automotive professional knowledge system, and secure and reliable data and models.


Reference address:Geely releases a new generation of speech synthesis model with voice cloning capabilities

Previous article:"Crazy" stacking of materials, the competition of car audio is still in its infancy
Next article:Automotive high-speed audio and video transmission vehicle Ethernet solution

Latest Automotive Electronics Articles
Change More Related Popular Components

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号