Model Card for Model ID
SonicVerse is a model that performs music captioning. Trained with concrete music feature labels to guide the captioning process, it provides features such as key, vocals, vocals gender, instrument, mood/theme, genre, in the generated caption. The model is trained for 10 second snippets of music for detailed captioning. The Spaces demo allows chaining captions of multiple chunks of 10 seconds of music to generate a long detailed caption.
Model Details
Model Description
Trained with a multi-tasking projector that outputs aligned language tokens from music input. Additionally, feature extraction (eg. key classification, vocals classification) is trained and then projected to language tokens, guiding the captioning.
- Developed by: AMAAI Lab
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: Multi-modal Audio Text to Text model
- Language(s) (NLP): English
- License: Apache-2.0
- Finetuned from model : mistralai/Mistral-7B-v0.1
Model Sources
- Repository: https://github.com/annabeth97c/sonicverse
- Paper [optional]: [More Information Needed]
- Demo : https://annabeth97c.github.io/sonicverse/
Uses
Model can be used for music-text paired dataset generation
How to Get Started with the Model
Use the instructions provided on the repository to run inference locally. Alternatively try out the model on the spaces page.
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
Framework versions
- PEFT 0.10.0
- Downloads last month
- 7
Model tree for amaai-lab/SonicVerse
Base model
m-a-p/MERT-v1-95M