Project ๐
Collection
Some self-explaration works on transformer: audio, images, text, ...
โข
2 items
โข
Updated
โข
1
This model is a fine-tuned version of sesame/csm-1b using the Elise dataset. There are sample outputs files in the repository.
from transformers import CsmForConditionalGeneration, AutoProcessor
import torch
import soundfile as sf
# Load the model
model_id = "keanteng/sesame-csm-elise" # Replace with your model
device = "cuda" if torch.cuda.is_available() else "cpu"
processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)
# Simple text generation
conversation = [
{"role": "0", "content": [{"type": "text", "text": "Hello, this is a test!"}]}
]
inputs = processor.apply_chat_template(
conversation,
tokenize=True,
return_dict=True,
).to(device)
# Generate audio
audio = model.generate(**inputs, output_audio=True)
audio_cpu = audio[0].to(torch.float32).cpu().numpy()
# Save to file
sf.write("output.wav", audio_cpu, 24000)
Base model
sesame/csm-1b