CantusSVS-hf / README.md
liampond
Clean deploy snapshot
c42fe7e
|
raw
history blame
3.8 kB

CantusSVS

Table of Contents


About CantusSVS

CantusSVS is a singing voice synthesis tool that automatically generates audio playback for the Latin chants in Cantus. You can access CantusSVS directly in the browser here https://cantussvs.streamlit.app. For training and inferencing, we use DiffSinger, a diffusion-based singing voice synthesis model described in the paper below:

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Liu, Jinglin, Chengxi Li, Yi Ren, Feiyang Chen, and Zhou Zhao. 2022. "Diffsinger: Singing Voice Synthesis via Shallow Diffusion Mechanism." In Proceedings of the AAAI Conference on Artificial Intelligence 36 10: 11020–11028. https://arxiv.org/abs/2105.02446.

Training was done using Cedar, a cluster provided by the Digital Research Alliance of Canada. To set up training locally, follow this tutorial by tigermeat.

For general help training and creating a dataset, this tutorial by PixPrucer is an excellent guide. For help, join the DiffSinger Discord server.

The dataset used for this project was built using Adventus: Dominica prima adventus Domini, the first track from Psallentes' album Salzinnes Saints. Psallentes is a Belgian women's chorus that specializes in Late Medieval and Renaissance music. Salzinnes Saints is an album of music from the Salzinnes Antiphonal, a mid-sixteenth century choirbook with the music and text for the Liturgy of the Hours.


Quick Start

  1. Clone the repository:

    git clone https://github.com/yourusername/CantusSVS.git
    cd CantusSVS
    
  2. Set up the environment:

    make setup
    
  3. Run the web app locally:

    make run
    
  4. Open your browser at:

    http://localhost:8501
    

Or just use the hosted app here: https://cantussvs.streamlit.app


Preparing Your Input

  • Most commercial music composition software can export .mei files. MuseScore 4 is free to use.
  • Input format must be .mei (Music Encoding Initiative XML).
  • Only monophonic scores are supported (one staff, one voice).
  • Lyrics must be embedded in the MEI file and aligned with notes.

Validation tool:

python scripts/validate_mei.py your_song.mei

Running Locally

  1. Drop your .mei file into the upload area of the web app.

  2. Choose settings:

    • Tempo (BPM)
    • Output file name (optional)
  3. Hit "Synthesize" and download the resulting .wav file.

Generated files:

  • .wav: final audio output
  • .mel.npy: intermediate mel-spectrogram
  • .info.json: metadata (phoneme sequence, note mapping)

FAQ

Q: Can I synthesize polyphonic (multi-voice) chants?
A: No, only monophonic scores are supported currently. However, in the future, polyphonic chants could be synthesized by layering multiple monophonic voices.

Q: Can I change the voice timbre?
A: In the webapp, only the provided pre-trained model is available. However, DiffSinger will learn the timbre of the input dataset so if you train your own model, you can control the timbre that way.