Model Card for Model ID

SonicVerse is a model that performs music captioning. Trained with concrete music feature labels to guide the captioning process, it provides features such as key, vocals, vocals gender, instrument, mood/theme, genre, in the generated caption. The model is trained for 10 second snippets of music for detailed captioning. The Spaces demo allows chaining captions of multiple chunks of 10 seconds of music to generate a long detailed caption.

Model Details

Model Description

Trained with a multi-tasking projector that outputs aligned language tokens from music input. Additionally, feature extraction (eg. key classification, vocals classification) is trained and then projected to language tokens, guiding the captioning.

  • Developed by: AMAAI Lab
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: Multi-modal Audio Text to Text model
  • Language(s) (NLP): English
  • License: Apache-2.0
  • Finetuned from model : mistralai/Mistral-7B-v0.1

Model Sources

Uses

Model can be used for music-text paired dataset generation

How to Get Started with the Model

Use the instructions provided on the repository to run inference locally. Alternatively try out the model on the spaces page.

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

  • PEFT 0.10.0
Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for amaai-lab/SonicVerse

Base model

m-a-p/MERT-v1-95M
Adapter
(1)
this model

Dataset used to train amaai-lab/SonicVerse

Spaces using amaai-lab/SonicVerse 2