The first domestic music version "ChatGPT" is here! Sora has the same structure, comprehensive development of singing and composition skills, and also revealed the new MoE large model

Latest update time：2024-04-03

Reads：

Yun Zhongfa comes from Ao Fei Si
Qubit | Public account QbitAI

The big AI music model has become so popular recently, no need to introduce it further, right?

However, let’s not talk about other things about the overseas version of the application, the weird Chinese AI pronunciation can make people uncomfortable to death...

Fortunately, with volume applications, large domestic model manufacturers are not afraid. No, the domestic version of music "ChatGPT" is here~

Without further ado, let’s listen to the effect first:

This emotional expressiveness has the potential to compete with the short video divine comedy.

The author behind such a work is the large-scale domestic AI music generation model "Tiangong SkyMusic" that has just been opened for testing .

On April 2, Kunlun Technology officially announced that "Tiangong SkyMusic" is based on Kunlun Technology's "Tiangong 3.0" super model , and will open free testing to the public from now on.

This round of testing has opened 1,000 free places, open to industry media, experts, and interested music practitioners.

According to official news, hundreds of thousands of reservation applications were made on the first day of Kunlun Wanwei’s “Tiangong SkyMusic”.

Some netizens have already played with it and posted the work:

"Tiangong SkyMusic" is also the only publicly available large-scale AI music generation model in China.

Kunlun Wanwei engineers revealed that "Tiangong SkyMusic" is an important achievement of Kunlun Wanwei's emotional AGI research direction:

Intelligence is important, but emotions are the key to our ability to be called human beings.

We found that compared to text and pictures, audio content is the best way to understand human emotions, and music is the most abundant content carrier for expressing human emotions that is not restricted by geography and culture.

Adopt self-developed Sora model architecture

Let’s look at the specific technical details.

"Tiangong SkyMusic" adopts Sora model architecture in the music audio field:

Large-scale Transformer is responsible for composing music, learning the contextual dependencies of Music Patches, and achieving music controllability at the same time;

Diffusion Transformer is responsible for singing, and uses LDM to restore Music Patches into high-quality audio, so that "Tiangong SkyMusic" can support the generation of 80-second 44100Hz sampling rate two-channel stereo songs.

This model architecture works extremely well in the fields of video, audio, and music. The Kunlun Wanwei team also plans to gradually iterate and add new capabilities in the future, so that the model has multi-modal emotional understanding and expression capabilities.

"Tiangong SkyMusic" has the following five major features:

High quality AI music

"Tiangong SkyMusic" can generate 80-second 44100Hz sampling rate two-channel stereo AI songs, and can generate corresponding song styles based on the lyrics style input by the user.

The human voice is fake

Vocal synthesis is the most important dimension in AI music generation that best reflects the generation effect and quality. The AI vocal synthesis of "Tiangong SkyMusic" can reach the industry's top SOTA level, especially the Chinese pronunciation is clear and no noise, and the singing effect is significantly better than that of foreign products.

Lyric paragraph control

"Tiangong SkyMusic" can control songs through lyrics, so that the generated songs can clearly distinguish the emotional changes of different lyric passages, reflecting the differences between verses and chorus, intro and verse.

Various music styles

"Tiangong SkyMusic" supports rap, folk, funk, ancient style, electronic and other music styles. When creating music, users can specify the desired music style by referring to the audio.

For example, in rap style, the effect is as follows:

Intelligent Expression of Music: Learning Singing Skills

"Tiangong SkyMusic" can also learn a variety of singing techniques such as vibrato, opera, singing, male and female duets, automatic harmony, etc., so that the songs created by users can achieve more appropriate emotional expression.

Built based on the "Tiangong 3.0" large model

Behind "Tiangong SkyMusic", one thing worth noting is that Kunlun Wanwei also revealed the latest information about its own MoE large model "Tiangong 3.0":

On April 17, “Tiangong 3.0” will officially open public beta and be open source simultaneously.

"Tiangong 3.0" is a 400 billion-level parameter MoE hybrid expert model. It is one of the MoE models with the largest model parameters and the strongest performance in the world.

Officials stated that compared with the previous generation "Tiangong 2.0" MoE large model, "Tiangong 3.0" has significant improvements in model semantic understanding, logical reasoning, versatility, generalization, uncertainty knowledge, and learning capabilities. Its performance has improved, its model technical knowledge ability has increased by more than 20%, and its mathematics/reasoning/coding/cultural and creative abilities have increased by more than 30%.

At the same time, "Tiangong 3.0" has added search enhancements, research modes, calling codes and drawing charts, multiple calls to network searches, etc., and has specifically trained the agent capabilities of the model, so that "Tiangong 3.0" can independently complete Plan, call, and combine external tools and information to accurately and efficiently complete various complex needs such as industrial analysis and product comparison.

"Tiangong 3.0" is also the world's first multi-modal "Super Model", integrating AI search, AI writing, AI long text reading, AI conversation, AI speech synthesis, AI picture generation, AI comic creation, AI Image recognition, AI music generation, AI code writing, AI table generation and many other capabilities can be called "super applications" in the era of large models.

For more spoilers, we will look at it in four aspects:

Stronger logical reasoning ability

The improvement of logical reasoning capabilities is crucial for large models to solve complex problems. The mathematics and reasoning capabilities of "Tiangong 3.0" have both improved by more than 30%. The powerful logical reasoning capabilities enable it to process information more accurately and efficiently in practical applications. .

For example, in the "Tiangong 3.0" AI search research model, the model can extend relevant questions around a simple instruction from the user, and determine in real time whether the information in this paragraph needs to be searched online. This can enable detailed analysis of an industry, for example. Dismantling analysis, summarizing relevant events, dismantling industrial chain maps and other complex functions, and finally displaying it in the form of a structured or mind map to make the model more "smart".

Better semantic understanding

"Tiangong 3.0" can better understand and process complex semantic information in users' natural language Query, including metaphors, polysemy, etc.

For example, in the enhanced search of "Tiangong 3.0" AI search, the model can disassemble and refine the user's complex Query, and perform questioning, information understanding and completion, making it more powerful in terms of natural semantic understanding. It performs better when faced with uncertain knowledge and can meet user needs more accurately and efficiently.

Special Agent training, stronger ability to cope with complex needs

In the era of large models, AI Agent has become the mainstream implementation direction of large model technology.

"Tiangong 3.0" has conducted special training on the model's ability to independently plan, call, and combine external tools and information, enabling it to independently generate and call code, including industrial research, product reviews, information analysis, picture generation, and chart drawing. and a variety of complex user needs, and become an all-round expert with professional knowledge and capabilities in multiple fields. Use strong semantic understanding and logical reasoning capabilities to deeply understand user needs, break down tasks into subdivided links, and send them to different Use the optimal model to process and maximize model performance.

At the same time, for B-side users, "Tiangong 3.0" has also been comprehensively upgraded in areas such as knowledge base capabilities, arbitrary tool calling capabilities, and complex role command tracing capabilities. Enterprise users can build exclusive knowledge bases and agents by uploading knowledge documents, and Realize practical capabilities such as automatically calling formulation tools and completing complex instructions to follow Agent construction.

Comprehensive upgrade of content creation capabilities

Content creation capabilities have always been the strength of the "Tiangong" series of large models. Based on the previous generation of "Tiangong 2.0" large models, "Tiangong 3.0" has undergone a comprehensive upgrade in content creation capabilities. It can not only realize AI It has powerful content creation capabilities such as music generation, AI voice, AI dialogue, and AI two-dimensional comic generation. Through special Agent training, it has realized the ability to generate images in real time based on text requirements during conversations, real-time content analysis and chart construction based on text requirements. , becoming a super model that can truly search, write, read, chat, listen, speak, draw, see, and sing.

Fang Han, chairman and CEO of Kunlun Wanwei, said that "super models" are inevitable for the development of the era of large models. In the future, there will be more than one "super model" in the industry, and Kunlun Wanwei will continue to move in this direction. Strive to continue to provide users with smarter, more efficient and more reliable artificial intelligence services.

All in AGI and AIGC

Since the "All in AGI and AIGC" strategy was determined in 2023, in the field of AIGC applications, Kunlun Wanwei has launched a series of cutting-edge AI products around its self-developed "Tiangong" series of large models:

In August 2023, Kunlun Wanwei launched Tiangong AI Search, the first domestic AI search product.

In September, Kunlun Wanwei launched the multi-modal large model Skywork-MM, which ranked first in the comprehensive score in the multi-modal large language model evaluation MME.

In October, Kunlun Wanwei launched the Skywork-13B series of open source tens of billions of large language models.

In December, Kunlun Wanwei released SkyAgents, China’s leading AI Agent development platform.

In February 2024, the Tiangong Base large model ushered in the largest version update since its launch, Tiangong 2.0, becoming the first large language model AI with hundreds of billions of parameters in China equipped with MoE architecture and open to all C-side users for free. application.

Coupled with the newly unveiled Tiangong SkyMusic, based on the Tiangong series of large models, Kunlun Wanwei has built an AI business matrix such as AI large models, AI search, AI music, AI social networking, AI animation, and AI games. It is a domestic model technology One of the artificial intelligence companies with the strongest engineering capabilities and the most comprehensive layout.

With such a report card, it is worth looking forward to what kind of experience "Tiangong 3.0" will bring this time.

We will also evaluate the experience as soon as possible. If you have anything you want to test, please tell us in the comment area~

*This article was published with permission from Qubit, and the views are solely those of the author.

-over-