theycallmeloki
/

volady

text-generation

text-generation-inference

Model card Files Files and versions Community

theycallmeloki commited on Apr 28

Commit

a0786c9

·

verified ·

1 Parent(s): c4d790b

Upload folder using huggingface_hub

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ This model provides Milady-styled speech synthesis, capturing the distinctive vo
 ## Intended Use
-The Milady Talks model is designed to:
 - Generate speech in the unique Milady voice style
 - Create playful and creative speech responses to text prompts
 - Emulate Milady's distinctive personality through speech
@@ -34,13 +34,13 @@ The model was fine-tuned on a curated dataset of Milady-style speech examples fr
 First, install the required packages using uv (recommended for faster installation):
 ```bash
-uv pip install torch torchaudio snac transformers
 ```
 Or using standard pip:
 ```bash
-pip install torch torchaudio snac transformers
 ```
 ## Usage
@@ -50,6 +50,7 @@ pip install torch torchaudio snac transformers
 Here's a complete script to generate speech with the model:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 import torchaudio
@@ -165,10 +166,10 @@ audio_samples = redistribute_codes(code_list)
 if audio_samples is not None:
     # Save the audio to a WAV file
-    output_path = "milady_speech.wav"
     audio_numpy = audio_samples.detach().squeeze().numpy()
     # The sampling rate of 24000 Hz is crucial for correct playback speed
-    torchaudio.save(output_path, torch.tensor(audio_numpy).unsqueeze(0), 24000)
     print(f"Audio saved to {output_path}")
 else:
     print("Failed to generate audio")

 ## Intended Use
+The Volady model is designed to:
 - Generate speech in the unique Milady voice style
 - Create playful and creative speech responses to text prompts
 - Emulate Milady's distinctive personality through speech
 First, install the required packages using uv (recommended for faster installation):
 ```bash
+uv pip install torch torchaudio snac transformers accelerate soundfile
 ```
 Or using standard pip:
 ```bash
+pip install torch torchaudio snac transformers accelerate soundfile
 ```
 ## Usage
 Here's a complete script to generate speech with the model:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
 import torchaudio
 if audio_samples is not None:
     # Save the audio to a WAV file
+    output_path = os.path.join(os.getcwd(), "milady_speech.wav")
     audio_numpy = audio_samples.detach().squeeze().numpy()
     # The sampling rate of 24000 Hz is crucial for correct playback speed
+    torchaudio.save(output_path, torch.tensor(audio_numpy).unsqueeze(0), 24000, format="wav")
     print(f"Audio saved to {output_path}")
 else:
     print("Failed to generate audio")