---
title: CantusSVS
emoji: 🕊️
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: "1.32.2"
app_file: app.py
pinned: false
---


# CantusSVS

## Table of Contents
- [About CantusSVS](#about-cantussvs)
- [Preparing Your Input](#preparing-your-input)
- [Running Locally](#running-locally)
- [FAQ](#faq)

---

## About CantusSVS

CantusSVS is a singing voice synthesis tool that automatically generates audio playback for the Latin chants in Cantus. You can access CantusSVS directly in the browser here [**https://cantussvs.streamlit.app**](https://cantussvs.streamlit.app). For training and inferencing, we use **DiffSinger**, a diffusion-based singing voice synthesis model described in the paper below:

**DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism**  

Liu, Jinglin, Chengxi Li, Yi Ren, Feiyang Chen, and Zhou Zhao. 2022. "Diffsinger: Singing Voice Synthesis via Shallow Diffusion Mechanism." In *Proceedings of the AAAI Conference on Artificial Intelligence* 36 10: 11020–11028. [https://arxiv.org/abs/2105.02446](http://dx.doi.org/10.1609/aaai.v36i10.21350).

Training was done using Cedar, a cluster provided by the Digital Research Alliance of Canada. To set up training locally, follow [this tutorial](https://youtu.be/Sxt11TAflV0?feature=shared) by [tigermeat](https://www.youtube.com/@spicytigermeat).

For general help training and creating a dataset, [this tutorial](https://docs.google.com/document/d/1uMsepxbdUW65PfIWL1pt2OM6ZKa5ybTTJOpZ733Ht6s/view) by [PixPrucer](https://bsky.app/profile/pixprucer.bsky.social) is an excellent guide. For help, join the [DiffSinger Discord server](https://discord.gg/DZ6fhEUfnb).

The dataset used for this project was built using [*Adventus: Dominica prima adventus Domini*](https://youtu.be/ThnPySybDJs?feature=shared), the first track from [Psallentes](https://psallentes.com/)' album *Salzinnes Saints*. Psallentes is a Belgian women's chorus that specializes in Late Medieval and Renaissance music. *Salzinnes Saints* is an album of music from the [Salzinnes Antiphonal](https://www.smu.ca/academics/archives/the-salzinnes-antiphonal.html), a mid-sixteenth century choirbook with the music and text for the Liturgy of the Hours.

---

## Preparing Your Input

- Most commercial music composition software can export `.mei` files. MuseScore 4 is free to use.
- Input format must be `.mei` (Music Encoding Initiative XML).
- Only **monophonic** scores are supported (one staff, one voice).
- Lyrics must be embedded in the MEI file and aligned with notes.

Validation tool:

```bash
python scripts/validate_mei.py your_song.mei
```

---

## Running Locally

1. Drop your `.mei` file into the upload area of the web app.

2. Choose settings:
   - Tempo (BPM)
   - Output file name (optional)

3. Hit "Synthesize" and download the resulting `.wav` file.

Generated files:
- `.wav`: final audio output
- `.mel.npy`: intermediate mel-spectrogram
- `.info.json`: metadata (phoneme sequence, note mapping)

---

## FAQ

**Q: Can I synthesize polyphonic (multi-voice) chants?**  
A: No, only monophonic scores are supported currently. However, in the future, polyphonic chants could be synthesized by layering multiple monophonic voices.

**Q: Can I change the voice timbre?**  
A: In the webapp, only the provided pre-trained model is available. However, DiffSinger will learn the timbre of the input dataset so if you train your own model, you can control the timbre that way.

---