File size: 5,204 Bytes
c42fe7e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# Getting Started

## Installation

### Environments and dependencies

DiffSinger requires Python 3.8 or later. We strongly recommend you create a virtual environment via Conda or venv before installing dependencies.

1. Install The latest PyTorch following the [official instructions](https://pytorch.org/get-started/locally/) according to your OS and hardware.

2. Install other dependencies via the following command:

   ```bash
   pip install -r requirements.txt
   ```

### Materials and assets

Some essential materials and assets are needed before continuing with this repository. See [materials for training and using models](BestPractices.md#materials-for-training-and-using-models) for detailed instructions.

## Configuration

Every model needs a configuration file to run preprocessing, training, inference and deployment. Templates of configurations files are in [configs/templates](../configs/templates). Please **copy** the templates to your own data directory before you edit them.

Before you continue, it is highly recommended to read through [Best Practices](BestPractices.md), which is a more detailed tutorial on how to configure your experiments.

For more details about configurable parameters, see [Configuration Schemas](ConfigurationSchemas.md).

> Tips: to see which parameters are required or recommended to be edited, you can search by _customizability_ in the configuration schemas.

## Preprocessing

Raw data pieces and transcriptions should be binarized into dataset files before training. Before doing this step, please ensure all required configurations like `raw_data_dir` and `binary_data_dir` are set properly, and all your desired functionalities and features are enabled and configured.

Assume that you have a configuration file called `my_config.yaml`. Run:

```bash
python scripts/binarize.py --config my_config.yaml
```

Preprocessing can be accelerated through multiprocessing. See [binarization_args.num_workers](ConfigurationSchemas.md#binarization_args.num_workers) for more explanations.

## Training

Assume that you have a configuration file called `my_config.yaml` and the name of your model is `my_experiment`. Run:

```bash
python scripts/train.py --config my_config.yaml --exp_name my_experiment --reset
```

Checkpoints will be saved at the `checkpoints/my_experiment/` directory. When interrupting the program and running the above command again, the training resumes automatically from the latest checkpoint.

For more suggestions related to training performance, see [performance tuning](BestPractices.md#performance-tuning).

### TensorBoard

Run the following command to start the TensorBoard:

```bash
tensorboard --logdir checkpoints/
```

> NOTICE
> 
> If you are training a model with multiple GPUs (DDP), please add `--reload_multifile=true` option when launching TensorBoard, otherwise it may not update properly.

## Inference

Inference of DiffSinger is based on DS files. Assume that you have a DS file named `my_song.ds` and your model is named `my_experiment`.

If your model is a variance model, run:

```bash
python scripts/infer.py variance my_song.ds --exp my_experiment
```

or run

```bash
python scripts/infer.py variance --help
```

for more configurable options.

If your model is an acoustic model, run:

```bash
python scripts/infer.py acoustic my_song.ds --exp my_experiment
```

or run

```bash
python scripts/infer.py acoustic --help
```

for more configurable options.

## Deployment

DiffSinger uses [ONNX](https://onnx.ai/) as the deployment format.

Due to TorchScript issues, exporting to ONNX now requires PyTorch **1.13**. Please ensure the correct dependencies through following steps:

1. Create a new separate environment for exporting ONNX.

2. Install PyTorch 1.13 following the [official instructions](https://pytorch.org/get-started/previous-versions/). A CPU-only version is enough.

3. Install other dependencies via the following command:

   ```bash
   pip install -r requirements-onnx.txt
   ```

Assume that you have a model named `my_experiment`.

If your model is a variance model, run:

```bash
python scripts/export.py variance --exp my_experiment
```

or run

```bash
python scripts/export.py variance --help
```

for more configurable options.

If your model is an acoustic model, run:

```bash
python scripts/export.py acoustic --exp my_experiment
```

or run

```bash
python scripts/export.py acoustic --help
```

for more configurable options.

To export an NSF-HiFiGAN vocoder checkpoint, run:

```bash
python scripts/export.py nsf-hifigan --config CONFIG --ckpt CKPT
```

where `CONFIG` is a configuration file that has configured the same mel parameters as the vocoder (can be configs/acoustic.yaml for most cases) and `CKPT` is the path of the checkpoint to be exported.

For more configurable options, run

```bash
python scripts/export.py nsf-hifigan --help
```

## Other utilities

There are other useful CLI tools in the [scripts/](../scripts) directory not mentioned above:

- drop_spk.py - delete speaker embeddings from checkpoints (for data security reasons when distributing models)
- vocoder.py - bypass the acoustic model and only run the vocoder on given mel-spectrograms