A newer version of this model is available: keanteng/sesame-csm-elise-lora

CSM Elise Voice Model

This model is a fine-tuned version of sesame/csm-1b using the Elise dataset. There are sample outputs files in the repository.

Model Details

  • Base Model: sesame/csm-1b
  • Training Data: MrDragonFox/Elise dataset
  • Fine-tuning Approach: Voice cloning through conditional speech generation
  • Voice Characteristics: [Describe voice qualities]
  • Training Parameters:
    • Learning Rate: 2e-5
    • Epochs: 3
    • Batch Size: 1 with gradient accumulation steps of 4

Quick Start

from transformers import CsmForConditionalGeneration, AutoProcessor
import torch
import soundfile as sf

# Load the model
model_id = "keanteng/sesame-csm-elise"  # Replace with your model
device = "cuda" if torch.cuda.is_available() else "cpu"

processor = AutoProcessor.from_pretrained(model_id)
model = CsmForConditionalGeneration.from_pretrained(model_id, device_map=device)

Basic Text-to-Speech

# Simple text generation
conversation = [
    {"role": "0", "content": [{"type": "text", "text": "Hello, this is a test!"}]}
]

inputs = processor.apply_chat_template(
    conversation,
    tokenize=True,
    return_dict=True,
).to(device)

# Generate audio
audio = model.generate(**inputs, output_audio=True)
audio_cpu = audio[0].to(torch.float32).cpu().numpy()

# Save to file
sf.write("output.wav", audio_cpu, 24000)
Downloads last month
69
Safetensors
Model size
1.65B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for keanteng/sesame-csm-elise

Base model

sesame/csm-1b
Finetuned
(16)
this model

Dataset used to train keanteng/sesame-csm-elise

Collection including keanteng/sesame-csm-elise