Model Card for Model ID

SonicVerse is a model that performs music captioning. Trained with concrete music feature labels to guide the captioning process, it provides features such as key, vocals, vocals gender, instrument, mood/theme, genre, in the generated caption. The model is trained for 10 second snippets of music for detailed captioning. The Spaces demo allows chaining captions of multiple chunks of 10 seconds of music to generate a long detailed caption.

Model Details

Model Description

Trained with a multi-tasking projector that outputs aligned language tokens from music input. Additionally, feature extraction (eg. key classification, vocals classification) is trained and then projected to language tokens, guiding the captioning.

Developed by: AMAAI Lab
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: Multi-modal Audio Text to Text model
Language(s) (NLP): English
License: Apache-2.0
Finetuned from model : mistralai/Mistral-7B-v0.1

Model Sources

Repository: https://github.com/annabeth97c/sonicverse
Paper [optional]: [More Information Needed]
Demo : https://annabeth97c.github.io/sonicverse/

Uses

Model can be used for music-text paired dataset generation

How to Get Started with the Model

Use the instructions provided on the repository to run inference locally. Alternatively try out the model on the spaces page.

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Framework versions

PEFT 0.10.0

amaai-lab
/

SonicVerse

Model Card for Model ID

Model Details

Model Description

Model Sources

Uses

How to Get Started with the Model

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Framework versions

Model tree for amaai-lab/SonicVerse

Dataset used to train amaai-lab/SonicVerse

Spaces using amaai-lab/SonicVerse 2