| Riffusion | |
|---|---|
| Developers |
|
| Initial release | December 15, 2022 |
| Repository | github |
| Written in | Python |
| Type | Text-to-image model |
| License | MIT License |
| Website | riffusion.com |
Riffusion is aneural network, designed by Seth Forsgren and Hayk Martiros, that generates music using images of sound rather than audio.[1]
The resulting music has been described as "de otro mundo" (otherworldly),[2] although unlikely to replace man-made music.[2] The model was made available on December 15, 2022, with the code also freely available onGitHub.[3]
The first version of Riffusion was created as afine-tuning ofStable Diffusion, an existing open-source model for generating images from text prompts, onspectrograms,[1] resulting in a model which used text prompts to generate image files which could then be put through aninverse Fourier transform and converted into audio files.[3] While these files were only several seconds long, the model could also uselatent space between outputs tointerpolate different files together[1][4] (using theimg2img capabilities of SD).[5] It was one of many models derived from Stable Diffusion.[5]
In December 2022, Mubert[6] similarly used Stable Diffusion to turn descriptive text into music loops. In January 2023, Google published a paper on their own text-to-music generator called MusicLM.[7][8]
Forsgren and Martiros formed a startup, also called Riffusion, and raised $4 million in venture capital funding in October 2023.[9][10]
Thisartificial neural network-related article is astub. You can help Wikipedia byexpanding it. |
Thisscientific software article is astub. You can help Wikipedia byexpanding it. |