Article count:10400 Read by:146798529

Account Entry

Input text to generate music. This music version of Stable Diffusion has become popular. Netizens: The electronic music industry is about to be impacted | Playable online

Latest update time:2022-12-17
    Reads:
Yuyang from Aofei Temple
Quantum Bit | Public Account QbitAI

Stable Diffusion was played out by two Princeton alumni.

This is a picture generated by Stable Diffusion:

Don’t rush to say “ugly”. Look carefully. In fact, this is a spectrum chart .

It’s the kind that can really be converted into a piece of music!

And the two authors also stated:

Just a slight tweak to version 1.5 of Stable Diffusion.

This Stable Diffusion that can create music is called Riffusion (riff+Diffusion), and you can play it now by opening the web page.

Enter the prompt word to get a corresponding piece of music. For example, enter "ballad, female vocal introduction, transition to teen pop star."

The generated music is Aunt Jiang’s:

Such a music version of Stable Diffusion immediately attracted many netizens to watch after it was launched online.

Even the author himself quickly said: Don’t worry if you can’t try it, wait until we expand the GPU.

Some netizens have begun to worry about electronic music practitioners:

It will hit electronic music like a nuclear bomb.

Then the question comes——

How does Riffusion do it?

As mentioned at the beginning, the author stated that they did not make any modifications to Stable Diffusion v1.5.

The model was simply fine-tuned using spectrogram data paired with text.

In this way, Riffusion can generate the corresponding spectrogram based on the prompt words.

The background knowledge that needs to be supplemented here is that we can calculate the spectrogram from the audio using the short-time Fourier transform (STFT) . The short-time Fourier transform is reversible, so based on the spectrogram, we can also reconstruct a piece of audio.

However, the author mentioned that because the phase is chaotic, the model is difficult to learn. Therefore, the spectrum image generated by Riffusion actually only contains the amplitude of the sine wave, but does not include the phase.

In practice, when reconstructing the audio clip, the authors used the Griffin-Lim algorithm to approximate the phase.

It is worth mentioning that just like Stable Diffusion can prompt P pictures based on text, Riffusion can also modify the details of music based on text instructions.

For example, take the opening saxophone riff (that is, the riff) :

Change to piano version:

Silky transition

When you see this, you may feel that the riff generated by Riffusion is a bit short.

But in fact, Riffusion also has some longer works. The key is how to connect different music clips.

For example, start with a piece of rap, and then naturally transition to jazz:

The strategy adopted by the authors is to first select an initial spectrogram, and then continuously modify the graph to produce new changes by changing the seeds and prompt words.

In order to make the entire piece of music more harmonious and unified, the authors also performed interpolation in the latent space of the model.

Specifically, the latent space of cues with two different seeds can be sampled, or the latent space of two different cues with the same seed can be sampled.

About the author

If you are interested in Riffusion, click on the link at the end of the article to experience it directly~

Finally, I have to mention that Riffusion is actually a " amateur project ".

Its authors are two Princeton alumni.

Among them, Seth Forsgren studied biology at Princeton as an undergraduate. After graduation, he started many software startup projects. This year he just sold a project that turned a mobile phone into a walkie-talkie.

Hayk Martiros is a technical expert at Skydio, an American drone unicorn. He also graduated from Princeton as an undergraduate and later completed his graduate studies at Stanford.

Try it online:
https://www.riffusion.com/?&prompt=jack+johnson+vocals

Reference link:
https://www.riffusion.com/about

-over-

"Artificial Intelligence" and "Smart Car" WeChat communities invite you to join!

Friends who are interested in artificial intelligence and smart cars are welcome to join us, communicate and discuss with AI practitioners, and not miss the latest industry development & technological progress.

PS. When adding friends, please be sure to note your name-company-position~


click here


Latest articles about

 
EEWorld WeChat Subscription

 
EEWorld WeChat Service Number

 
AutoDevelopers

About Us Customer Service Contact Information Datasheet Sitemap LatestNews

Room 1530, Zhongguancun MOOC Times Building,Block B, 18 Zhongguancun Street, Haidian District,Beijing, China Tel:(010)82350740 Postcode:100190

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号