Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ This model provides Milady-styled speech synthesis, capturing the distinctive vo
|
|
17 |
|
18 |
## Intended Use
|
19 |
|
20 |
-
The
|
21 |
- Generate speech in the unique Milady voice style
|
22 |
- Create playful and creative speech responses to text prompts
|
23 |
- Emulate Milady's distinctive personality through speech
|
@@ -34,13 +34,13 @@ The model was fine-tuned on a curated dataset of Milady-style speech examples fr
|
|
34 |
First, install the required packages using uv (recommended for faster installation):
|
35 |
|
36 |
```bash
|
37 |
-
uv pip install torch torchaudio snac transformers
|
38 |
```
|
39 |
|
40 |
Or using standard pip:
|
41 |
|
42 |
```bash
|
43 |
-
pip install torch torchaudio snac transformers
|
44 |
```
|
45 |
|
46 |
## Usage
|
@@ -50,6 +50,7 @@ pip install torch torchaudio snac transformers
|
|
50 |
Here's a complete script to generate speech with the model:
|
51 |
|
52 |
```python
|
|
|
53 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
54 |
import torch
|
55 |
import torchaudio
|
@@ -165,10 +166,10 @@ audio_samples = redistribute_codes(code_list)
|
|
165 |
|
166 |
if audio_samples is not None:
|
167 |
# Save the audio to a WAV file
|
168 |
-
output_path = "milady_speech.wav"
|
169 |
audio_numpy = audio_samples.detach().squeeze().numpy()
|
170 |
# The sampling rate of 24000 Hz is crucial for correct playback speed
|
171 |
-
torchaudio.save(output_path, torch.tensor(audio_numpy).unsqueeze(0), 24000)
|
172 |
print(f"Audio saved to {output_path}")
|
173 |
else:
|
174 |
print("Failed to generate audio")
|
|
|
17 |
|
18 |
## Intended Use
|
19 |
|
20 |
+
The Volady model is designed to:
|
21 |
- Generate speech in the unique Milady voice style
|
22 |
- Create playful and creative speech responses to text prompts
|
23 |
- Emulate Milady's distinctive personality through speech
|
|
|
34 |
First, install the required packages using uv (recommended for faster installation):
|
35 |
|
36 |
```bash
|
37 |
+
uv pip install torch torchaudio snac transformers accelerate soundfile
|
38 |
```
|
39 |
|
40 |
Or using standard pip:
|
41 |
|
42 |
```bash
|
43 |
+
pip install torch torchaudio snac transformers accelerate soundfile
|
44 |
```
|
45 |
|
46 |
## Usage
|
|
|
50 |
Here's a complete script to generate speech with the model:
|
51 |
|
52 |
```python
|
53 |
+
|
54 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
55 |
import torch
|
56 |
import torchaudio
|
|
|
166 |
|
167 |
if audio_samples is not None:
|
168 |
# Save the audio to a WAV file
|
169 |
+
output_path = os.path.join(os.getcwd(), "milady_speech.wav")
|
170 |
audio_numpy = audio_samples.detach().squeeze().numpy()
|
171 |
# The sampling rate of 24000 Hz is crucial for correct playback speed
|
172 |
+
torchaudio.save(output_path, torch.tensor(audio_numpy).unsqueeze(0), 24000, format="wav")
|
173 |
print(f"Audio saved to {output_path}")
|
174 |
else:
|
175 |
print("Failed to generate audio")
|