malvika2003's picture
Upload folder using huggingface_hub
db5855f verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

Text-to-Music generation using Riffusion and OpenVINO

Riffusion is a latent text-to-image diffusion model capable of generating spectrogram images given any text input. These spectrograms can be converted into audio clips. General diffusion models are machine learning systems that are trained to denoise random gaussian noise step by step, to get to a sample of interest, such as an image. Diffusion models have shown to achieve state-of-the-art results for generating image data. But one downside of diffusion models is that the reverse denoising process is slow. In addition, these models consume a lot of memory because they operate in pixel space, which becomes unreasonably expensive when generating high-resolution images. Therefore, it is challenging to train these models and also use them for inference. OpenVINO brings capabilities to run model inference on Intel hardware and opens the door to the fantastic world of diffusion models for everyone!

In this tutorial, we consider how to run an text-to-music generation pipeline using Riffusion and OpenVINO. We will use a pre-trained model from the Diffusers library. To simplify the user experience, the Hugging Face Optimum Intel library is used to convert the models to OpenVINO™ IR format.

The complete pipeline of this demo is shown below.

riffusion_pipeline.png

Notebook Contents

This notebook demonstrates how to convert and run riffusion using OpenVINO.

The tutorial consists of the following steps:

This notebook provides interactive interface, where user can insert own musical input prompt and model will generate spectrogram image and sound guided by provided input. The result of demo work illustrated on image below.

demo_riffusion.png

Installation Instructions

This is a self-contained example that relies solely on its own code.
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start. For details, please refer to Installation Guide.